Results: Assemblies & Bins

After a pipeline completes, SeqDesk parses the outputs and links them to samples in the database. This page covers the result types and their data structure.

Assemblies

An assembly represents a set of assembled contigs for a sample.

Field	Description
Assembly Name	Identifier (usually derived from sample name)
Assembly File	Path to the contig FASTA file
Assembly Accession	ENA accession if submitted (GCA…)
Sample	The sample this assembly belongs to
Pipeline Run	The run that produced this assembly

Preferred Assembly

Each sample can have a preferred assembly — the assembly used for downstream analysis and ENA submission. When multiple assemblies exist (e.g., from MEGAHIT and SPAdes), the facility admin selects which one to use.

Output Location

MAG pipeline assemblies are found at:


{runDir}/Assembly/MEGAHIT/{sampleName}.contigs.fa.gz

Genome Bins (MAGs)

Genome bins are individual genomes extracted from metagenomic assemblies through binning algorithms.

Field	Description
Bin Name	Identifier (e.g., `sample1.001`)
Bin File	Path to the bin FASTA file
Bin Accession	ENA accession if submitted
Completeness	CheckM completeness score (0–100%)
Contamination	CheckM contamination score (0–100%)
Sample	The sample this bin belongs to
Pipeline Run	The run that produced this bin

Quality Metrics

Bin quality is assessed by CheckM, which estimates:

Completeness — what percentage of a complete genome is present
Contamination — what percentage of the bin is from other organisms

Common quality thresholds:

Category	Completeness	Contamination
High quality	≥ 90%	< 5%
Medium quality	≥ 50%	< 10%
Low quality	< 50%	any

Bin Sources

The MAG pipeline can produce bins from multiple binning tools:

MetaBAT2 — default binning
MaxBin2 — alternative binning
CONCOCT — optional (disabled by default)
DAS Tool — bin refinement, combines results from other tools

DAS Tool refined bins (at GenomeBinning/DASTool/bins/) are preferred.

Pipeline Artifacts

Beyond assemblies and bins, pipeline runs produce additional artifacts:

Type	Description
`reads`	Processed/filtered reads
`assembly`	Assembled contigs
`bins`	Genome bins
`qc_report`	Quality control reports (MultiQC, FastQC)
`alignment`	BAM alignment files

Each artifact tracks:

File path, size, and checksum
Which pipeline step produced it
Associated study and sample
Tool-specific metadata (JSON)

Read Outputs

Sequencing Order pipelines can also produce read-level results. SeqDesk distinguishes metadata updates from new read files:

Result kind	Behavior
`sample_read_metadata`	Updates the active read record with checksums, counts, QC metrics, or report links
`sample_read_candidate`	Stores new FASTQ files as run artifacts and marks them as pending review
`sample_read_replace`	Creates or supersedes canonical read records when the run completes

For read-cleaning and host-removal workflows, prefer sample_read_candidate with admin_review. This lets admins inspect MultiQC or other HTML reports before switching the sequencing order to cleaned reads. Raw and unknown source reads remain preserved; promoted cleaned reads become the active reads used for downstream sequencing order pipelines and delivery.

Pending Writebacks (Candidate Review)

When a pipeline stages sample_read_candidate outputs with the admin_review policy (as the Read Cleaning pipeline does), the cleaned FASTQ files are stored as run artifacts and held as pending writebacks rather than being applied automatically. A facility admin reviews the run’s reports, then chooses which per-sample candidates to promote into active reads.

Promotion is non-destructive: existing raw and unknown source reads are preserved, and an existing active cleaned read is superseded (not deleted) by the promoted candidate. The newly promoted reads become the active reads used for delivery and downstream sequencing order pipelines.

API

The review flow is backed by per-run endpoints (FACILITY_ADMIN only):

Method	Endpoint	Purpose
`GET`	`/api/pipelines/runs/[id]/pending-writebacks`	List discovered candidates and their current status for the run
`POST`	`/api/pipelines/runs/[id]/pending-writebacks`	Promote selected candidates (optional `sampleIds` body; omit to promote all)

Both require the FACILITY_ADMIN role; promotion is disabled in the public demo. The GET summary marks each candidate as candidate or promoted and returns the per-sample target read data class so the UI can show what will change before an admin confirms.

Reports and Preview Files

Pipeline packages should mark HTML, PDF, TSV, and text summaries as run_artifact, study_report, or order_report outputs. SeqDesk ranks these files on the run output page, shows the primary report first when one is available, and keeps the full output browser available for secondary files.

Assemblies Viewer

The Assemblies page (/assemblies) provides a centralized view of all genome assemblies across studies. The table shows:

Column	Description
Study	Study title and associated sequencing order number
Sample	Sample identifier
Final Assembly	The selected assembly with file path and pipeline run info
Selection Mode	How the final assembly was chosen (see below)
Available Count	Number of alternative assemblies for the sample
Download	Download the assembly FASTA file

Assembly Selection Modes

Mode	Meaning
Marked Final	Admin explicitly selected this assembly as preferred
Automatic	System selected the latest available assembly
Missing Preferred	Admin marked a preferred assembly, but it is no longer available
Unavailable	No assembly exists for this sample

By default, only facility admins can download assemblies. Enable the allowUserAssemblyDownload setting to let researchers download their own assembly files.

Viewing Results

Results are accessible from multiple places:

Assemblies page — centralized view of all assemblies across studies
Study page → Pipelines tab — overview of all runs and their results
Sample detail page — assemblies and bins for a specific sample
Pipeline run detail — all outputs from a single run