File Discovery & Auto-Detect

SeqDesk scans a configurable data directory to discover sequencing files. It automatically pairs forward and reverse reads and matches files to samples.

How Scanning Works

The file scanner:

Reads the configured base path (site.dataBasePath)
Searches up to N levels deep (sequencingFiles.scanDepth, default: 2)
Filters by allowed extensions (default: .fastq.gz, .fq.gz, .fastq, .fq)
Skips directories matching ignore patterns (default: **/tmp/**, **/undetermined/**)
Skips files modified within the last 30 seconds (activeWriteMinAgeMs, default 30000) so files still being written by a sequencer are not picked up mid-write; skipped files are reported in the scan warnings
Stops after 10,000 matching files and flags the result as truncated
Caches results for performance (5-minute TTL)

Supported File Naming

The scanner recognizes these naming patterns for R1/R2 pairing:

Pattern	Example
`{sample}_R1.fastq.gz`	`HG001_R1.fastq.gz`
`{sample}_R2.fastq.gz`	`HG001_R2.fastq.gz`
`{sample}_1.fastq.gz`	`HG001_1.fastq.gz`
`{sample}_2.fastq.gz`	`HG001_2.fastq.gz`
`{sample}_R1_001.fastq.gz`	`HG001_R1_001.fastq.gz`
`{sample}.R1.fastq.gz`	`HG001.R1.fastq.gz`
Illumina standard	`HG001_S1_L001_R1_001.fastq.gz`

The scanner strips R1/R2 indicators, lane numbers (_L001), and sample indices (_S1) to extract the sample identifier for matching.

Pairing Logic

Files are grouped by their extracted sample identifier:

If both R1 and R2 are found → paired-end
If only one file → single-end (always allowed; see below)
Files that do not match a pairing pattern are treated as R1 (single-end)

Configuration

File discovery settings can be configured through:

Config file — sequencingFiles section in seqdesk.config.json
Environment variables — SEQDESK_FILES_* variables
Admin UI — under Data Storage settings

Setting	Default	Description
`allowedExtensions`	`.fastq.gz`, `.fq.gz`, `.fastq`, `.fq`	File types to include
`scanDepth`	2	Directory levels to search
`allowSingleEnd`	true	Always forced to `true` — single-end files are never excluded, even if configured otherwise
`ignorePatterns`	`/tmp/`, `/undetermined/`	Glob patterns to skip
`activeWriteMinAgeMs`	30000	Skip files modified more recently than this (ms)
`autoAssign`	false	Whether discovery auto-assigns exact matches (see assignment page)

allowSingleEnd is not configurable. Even when a stored value is present, the loader overrides it to true, so unpaired (single-end) files are always included in discovery and assignment.

Testing the Configuration

In the admin settings, you can test your data path configuration:

Validate path — checks that the directory exists and is readable
Count files — shows how many matching files are found
Simulate discovery — previews what the scanner would find

This helps verify the configuration before using it in production.

Scan Caching

Scan results are cached to avoid repeated filesystem access. The cache is invalidated when:

The data path setting changes
File extension settings change
A manual rescan is triggered from the admin UI