Results: Assemblies & Bins
After a pipeline completes, SeqDesk parses the outputs and links them to samples in the database. This page covers the result types and their data structure.
Assemblies
An assembly represents a set of assembled contigs for a sample.
| Field | Description |
|---|---|
| Assembly Name | Identifier (usually derived from sample name) |
| Assembly File | Path to the contig FASTA file |
| Assembly Accession | ENA accession if submitted (GCA…) |
| Sample | The sample this assembly belongs to |
| Pipeline Run | The run that produced this assembly |
Preferred Assembly
Each sample can have a preferred assembly — the assembly used for downstream analysis and ENA submission. When multiple assemblies exist (e.g., from MEGAHIT and SPAdes), the facility admin selects which one to use.
Output Location
MAG pipeline assemblies are found at:
{runDir}/Assembly/MEGAHIT/{sampleName}.contigs.fa.gzGenome Bins (MAGs)
Genome bins are individual genomes extracted from metagenomic assemblies through binning algorithms.
| Field | Description |
|---|---|
| Bin Name | Identifier (e.g., sample1.001) |
| Bin File | Path to the bin FASTA file |
| Bin Accession | ENA accession if submitted |
| Completeness | CheckM completeness score (0–100%) |
| Contamination | CheckM contamination score (0–100%) |
| Sample | The sample this bin belongs to |
| Pipeline Run | The run that produced this bin |
Quality Metrics
Bin quality is assessed by CheckM, which estimates:
- Completeness — what percentage of a complete genome is present
- Contamination — what percentage of the bin is from other organisms
Common quality thresholds:
| Category | Completeness | Contamination |
|---|---|---|
| High quality | ≥ 90% | < 5% |
| Medium quality | ≥ 50% | < 10% |
| Low quality | < 50% | any |
Bin Sources
The MAG pipeline can produce bins from multiple binning tools:
- MetaBAT2 — default binning
- MaxBin2 — alternative binning
- CONCOCT — optional (disabled by default)
- DAS Tool — bin refinement, combines results from other tools
DAS Tool refined bins (at GenomeBinning/DASTool/bins/) are preferred.
Pipeline Artifacts
Beyond assemblies and bins, pipeline runs produce additional artifacts:
| Type | Description |
|---|---|
reads | Processed/filtered reads |
assembly | Assembled contigs |
bins | Genome bins |
qc_report | Quality control reports (MultiQC, FastQC) |
alignment | BAM alignment files |
Each artifact tracks:
- File path, size, and checksum
- Which pipeline step produced it
- Associated study and sample
- Tool-specific metadata (JSON)
Read Outputs
Order pipelines can also produce read-level results. SeqDesk distinguishes metadata updates from new read files:
| Result kind | Behavior |
|---|---|
sample_read_metadata | Updates the active read record with checksums, counts, QC metrics, or report links |
sample_read_candidate | Stores new FASTQ files as run artifacts and marks them as pending review |
sample_read_replace | Creates or supersedes canonical read records when the run completes |
For read-cleaning and host-removal workflows, prefer
sample_read_candidate with admin_review. This lets admins inspect MultiQC or
other HTML reports before switching the order to cleaned reads. Raw and unknown
source reads remain preserved; promoted cleaned reads become the active reads
used for downstream order pipelines and delivery.
Pending Writebacks (Candidate Review)
When a pipeline stages sample_read_candidate outputs with the admin_review
policy (as the Read Cleaning pipeline does), the cleaned FASTQ files are
stored as run artifacts and held as pending writebacks rather than being
applied automatically. A facility admin reviews the run’s reports, then chooses
which per-sample candidates to promote into active reads.
Promotion is non-destructive: existing raw and unknown source reads are preserved, and an existing active cleaned read is superseded (not deleted) by the promoted candidate. The newly promoted reads become the active reads used for delivery and downstream order pipelines.
API
The review flow is backed by per-run endpoints (FACILITY_ADMIN only):
| Method | Endpoint | Purpose |
|---|---|---|
GET | /api/pipelines/runs/[id]/pending-writebacks | List discovered candidates and their current status for the run |
POST | /api/pipelines/runs/[id]/pending-writebacks | Promote selected candidates (optional sampleIds body; omit to promote all) |
Both require the FACILITY_ADMIN role; promotion is disabled in the public
demo. The GET summary marks each candidate as candidate or promoted and
returns the per-sample target read data class so the UI can show what will
change before an admin confirms.
Reports and Preview Files
Pipeline packages should mark HTML, PDF, TSV, and text summaries as
run_artifact, study_report, or order_report outputs. SeqDesk ranks these
files on the run output page, shows the primary report first when one is
available, and keeps the full output browser available for secondary files.
Assemblies Viewer
The Assemblies page (/assemblies) provides a centralized view of all
genome assemblies across studies. The table shows:
| Column | Description |
|---|---|
| Study | Study title and associated order number |
| Sample | Sample identifier |
| Final Assembly | The selected assembly with file path and pipeline run info |
| Selection Mode | How the final assembly was chosen (see below) |
| Available Count | Number of alternative assemblies for the sample |
| Download | Download the assembly FASTA file |
Assembly Selection Modes
| Mode | Meaning |
|---|---|
| Marked Final | Admin explicitly selected this assembly as preferred |
| Automatic | System selected the latest available assembly |
| Missing Preferred | Admin marked a preferred assembly, but it is no longer available |
| Unavailable | No assembly exists for this sample |
By default, only facility admins can download assemblies. Enable the
allowUserAssemblyDownload setting to let researchers download their own
assembly files.
Viewing Results
Results are accessible from multiple places:
- Assemblies page — centralized view of all assemblies across studies
- Study page → Pipelines tab — overview of all runs and their results
- Sample detail page — assemblies and bins for a specific sample
- Pipeline run detail — all outputs from a single run