File Discovery & Auto-Detect
SeqDesk scans a configurable data directory to discover sequencing files. It automatically pairs forward and reverse reads and matches files to samples.
How Scanning Works
The file scanner:
- Reads the configured base path (
site.dataBasePath) - Searches up to N levels deep (
sequencingFiles.scanDepth, default: 2) - Filters by allowed extensions (default:
.fastq.gz,.fq.gz,.fastq,.fq) - Skips directories matching ignore patterns (default:
**/tmp/**,**/undetermined/**) - Skips files modified within the last 30 seconds (
activeWriteMinAgeMs, default30000) so files still being written by a sequencer are not picked up mid-write; skipped files are reported in the scan warnings - Stops after 10,000 matching files and flags the result as truncated
- Caches results for performance (5-minute TTL)
Supported File Naming
The scanner recognizes these naming patterns for R1/R2 pairing:
| Pattern | Example |
|---|---|
{sample}_R1.fastq.gz | HG001_R1.fastq.gz |
{sample}_R2.fastq.gz | HG001_R2.fastq.gz |
{sample}_1.fastq.gz | HG001_1.fastq.gz |
{sample}_2.fastq.gz | HG001_2.fastq.gz |
{sample}_R1_001.fastq.gz | HG001_R1_001.fastq.gz |
{sample}.R1.fastq.gz | HG001.R1.fastq.gz |
| Illumina standard | HG001_S1_L001_R1_001.fastq.gz |
The scanner strips R1/R2 indicators, lane numbers (_L001), and sample indices
(_S1) to extract the sample identifier for matching.
Pairing Logic
Files are grouped by their extracted sample identifier:
- If both R1 and R2 are found → paired-end
- If only one file → single-end (always allowed; see below)
- Files that do not match a pairing pattern are treated as R1 (single-end)
Configuration
File discovery settings can be configured through:
- Config file —
sequencingFilessection inseqdesk.config.json - Environment variables —
SEQDESK_FILES_*variables - Admin UI — under Data Storage settings
| Setting | Default | Description |
|---|---|---|
allowedExtensions | .fastq.gz, .fq.gz, .fastq, .fq | File types to include |
scanDepth | 2 | Directory levels to search |
allowSingleEnd | true | Always forced to true — single-end files are never excluded, even if configured otherwise |
ignorePatterns | **/tmp/**, **/undetermined/** | Glob patterns to skip |
activeWriteMinAgeMs | 30000 | Skip files modified more recently than this (ms) |
autoAssign | false | Whether discovery auto-assigns exact matches (see assignment page) |
allowSingleEnd is not configurable. Even when a stored value is present,
the loader overrides it to true, so unpaired (single-end) files are always
included in discovery and assignment.
Testing the Configuration
In the admin settings, you can test your data path configuration:
- Validate path — checks that the directory exists and is readable
- Count files — shows how many matching files are found
- Simulate discovery — previews what the scanner would find
This helps verify the configuration before using it in production.
Scan Caching
Scan results are cached to avoid repeated filesystem access. The cache is invalidated when:
- The data path setting changes
- File extension settings change
- A manual rescan is triggered from the admin UI