Skip to Content
Sequencing FilesFile Discovery & Auto-Detect

File Discovery & Auto-Detect

SeqDesk scans a configurable data directory to discover sequencing files. It automatically pairs forward and reverse reads and matches files to samples.

How Scanning Works

The file scanner:

  1. Reads the configured base path (site.dataBasePath)
  2. Searches up to N levels deep (sequencingFiles.scanDepth, default: 2)
  3. Filters by allowed extensions (default: .fastq.gz, .fq.gz, .fastq, .fq)
  4. Skips directories matching ignore patterns (default: **/tmp/**, **/undetermined/**)
  5. Skips files modified within the last 30 seconds (activeWriteMinAgeMs, default 30000) so files still being written by a sequencer are not picked up mid-write; skipped files are reported in the scan warnings
  6. Stops after 10,000 matching files and flags the result as truncated
  7. Caches results for performance (5-minute TTL)

Supported File Naming

The scanner recognizes these naming patterns for R1/R2 pairing:

PatternExample
{sample}_R1.fastq.gzHG001_R1.fastq.gz
{sample}_R2.fastq.gzHG001_R2.fastq.gz
{sample}_1.fastq.gzHG001_1.fastq.gz
{sample}_2.fastq.gzHG001_2.fastq.gz
{sample}_R1_001.fastq.gzHG001_R1_001.fastq.gz
{sample}.R1.fastq.gzHG001.R1.fastq.gz
Illumina standardHG001_S1_L001_R1_001.fastq.gz

The scanner strips R1/R2 indicators, lane numbers (_L001), and sample indices (_S1) to extract the sample identifier for matching.

Pairing Logic

Files are grouped by their extracted sample identifier:

  • If both R1 and R2 are found → paired-end
  • If only one file → single-end (always allowed; see below)
  • Files that do not match a pairing pattern are treated as R1 (single-end)

Configuration

File discovery settings can be configured through:

  • Config filesequencingFiles section in seqdesk.config.json
  • Environment variablesSEQDESK_FILES_* variables
  • Admin UI — under Data Storage settings
SettingDefaultDescription
allowedExtensions.fastq.gz, .fq.gz, .fastq, .fqFile types to include
scanDepth2Directory levels to search
allowSingleEndtrueAlways forced to true — single-end files are never excluded, even if configured otherwise
ignorePatterns**/tmp/**, **/undetermined/**Glob patterns to skip
activeWriteMinAgeMs30000Skip files modified more recently than this (ms)
autoAssignfalseWhether discovery auto-assigns exact matches (see assignment page)

allowSingleEnd is not configurable. Even when a stored value is present, the loader overrides it to true, so unpaired (single-end) files are always included in discovery and assignment.

Testing the Configuration

In the admin settings, you can test your data path configuration:

  • Validate path — checks that the directory exists and is readable
  • Count files — shows how many matching files are found
  • Simulate discovery — previews what the scanner would find

This helps verify the configuration before using it in production.

Scan Caching

Scan results are cached to avoid repeated filesystem access. The cache is invalidated when:

  • The data path setting changes
  • File extension settings change
  • A manual rescan is triggered from the admin UI