Assigning Files to Samples

After file discovery, sequencing files need to be linked to the correct samples. SeqDesk provides both automatic matching and manual assignment, from the facility-admin Sequencing tab of a sequencing order (/orders/[id]/sequencing).

Auto-Detect Matching

Matching is barcode-first. For each sample, the engine tries these sources in order and stops at the first that produces a match:

Run-plan barcode — the barcode assigned to the sample on a sequencing run plan. Files are matched by looking for the barcode (and, when known, the run ID) in the file’s directory path. (matchedBy: run-plan-barcode)
Sample barcode — a _barcode value stored on the sample’s custom fields. (matchedBy: sample-barcode)
Sample identifier — falls back to fuzzy filename matching against sampleId, sampleAlias, and sampleTitle. (matchedBy: sample-id)

Match status and confidence

Each suggestion has a status and a confidence score (0–1):

Status	Meaning
`exact`	A single confident match (paired ≈ 0.99, single-end ≈ 0.92 for barcode matches; ≥ 0.7 score for identifier matches)
`partial`	A weak/single-end match that needs review
`ambiguous`	Multiple candidate pairs matched — no auto-pick; the alternatives are listed for manual choice
`none`	No candidate files matched

For the identifier fallback, the score is computed by string similarity: an exact normalized match scores 1.0; a filename that contains the sample identifier scores 0.5–0.9; partial overlaps score lower. Two or more pairs scoring ≥ 0.7 are reported as ambiguous.

Auto-assignment

Auto-assignment is off by default (autoAssign: false). When enabled (per request or in config), discovery only auto-assigns a sample when all hold:

the suggestion status is exact, and
confidence is ≥ 0.9, and
a Read 1 file is present.

Everything else (partial, ambiguous, lower confidence) is surfaced for manual review and left unassigned. Samples that already have an assigned read are skipped unless you force a re-scan.

Manual Assignment

For files that do not auto-match or need correction, from the sequencing order’s Sequencing tab:

Browse the discovered files / suggestions list
For each file pair (R1/R2), select the target sample
Confirm the assignment

The view shows, per sample:

File path relative to the data base path
File size
Current assignment status and read data class
The matched-by source and confidence for any suggestion

What Gets Stored

When files are assigned, a Read record is created linking the files to the sample:

Field	Value
`file1`	Path to forward reads (R1)
`file2`	Path to reverse reads (R2), null for single-end
`checksum1/checksum2`	MD5 checksums for integrity verification
`sampleId`	The assigned sample
`dataClass`	`cleaned` for associated/uploaded reads (see below)
`dataClassSource`	How the class was set (`associate`, `upload`, `pipeline`, `manual`, …)
`isActive`	Whether this is the sample’s current active read

File paths are stored relative to the data base path and resolved at runtime. Assigning reads moves the sample’s facility status from WAITING/PROCESSING to SEQUENCED.

Raw vs Cleaned reads

Each Read carries a data class that protects original sequencer output from being silently overwritten by derived (cleaned) reads.

`dataClass`	Meaning
`cleaned`	Processed / analysis-ready reads (the default for associate and upload)
`raw`	Original sequencer output — protected
`unknown`	Unclassified — also protected

raw and unknown are protected classes. When you assign or upload a cleaned read over a sample whose current active read is protected (and the files differ), SeqDesk does not overwrite it. Instead it:

creates a new read record for the cleaned files and marks it active, and
marks the previously active raw/unknown read isActive = false and sets its supersededByReadId to the new read.

The protected read is preserved as provenance, not deleted. Replacing a non-protected (cleaned) read in place is allowed.

Choosing the active read

A sample can have several Read records. The active read is selected by preferring, in order: an active cleaned read with a file → any active read with a file → any read with a file. This is what downstream delivery and pipelines use.

Manual re-classification

Facility admins can manually re-classify a read’s data class (for example, marking an associated read as raw). This sets dataClass, dataClassSource = manual, and records who changed it and when. Manual re-classification updates the read in place — it does not supersede it.

Bulk Assignment

For sequencing orders with many samples, discovery can process all samples at once:

Open the sequencing order’s Sequencing tab
Click Discover Files
The system scans and suggests matches for all samples
Review the suggestions
Confirm assignments (or enable auto-assign to apply qualifying exact matches automatically — see the conditions above)

Partial, ambiguous, and low-confidence matches are left unassigned for manual review.

Cleaned reads from pipelines

The shipped read-cleaning pipeline does not write to Read records directly. Instead it produces cleaned-read candidates (artifacts flagged as sample_read_candidate). A facility admin reviews them under the run’s pending writebacks and promotes the chosen candidates (GET/POST /api/pipelines/runs/[id]/pending-writebacks). Promotion copies the cleaned files into the data directory and creates a new active cleaned read (with dataClassSource = pipeline), superseding the prior active read while preserving any protected raw/unknown reads.

Requirements for Pipelines

Before a pipeline can run on a study:

All included samples must have at least one read record
Read files must exist at the configured paths
For paired-end pipelines, both R1 and R2 must be assigned

The pipeline launcher validates these requirements before allowing a run to start. See Running a Pipeline.