Skip to Content
Pipelines & AnalysisResults: Assemblies & Bins

Results: Assemblies & Bins

After a pipeline completes, SeqDesk parses the outputs and links them to samples in the database. This page covers the result types and their data structure.

Assemblies

An assembly represents a set of assembled contigs for a sample.

FieldDescription
Assembly NameIdentifier (usually derived from sample name)
Assembly FilePath to the contig FASTA file
Assembly AccessionENA accession if submitted (GCA…)
SampleThe sample this assembly belongs to
Pipeline RunThe run that produced this assembly

Preferred Assembly

Each sample can have a preferred assembly — the assembly used for downstream analysis and ENA submission. When multiple assemblies exist (e.g., from MEGAHIT and SPAdes), the facility admin selects which one to use.

Output Location

MAG pipeline assemblies are found at:

{runDir}/Assembly/MEGAHIT/{sampleName}.contigs.fa.gz

Genome Bins (MAGs)

Genome bins are individual genomes extracted from metagenomic assemblies through binning algorithms.

FieldDescription
Bin NameIdentifier (e.g., sample1.001)
Bin FilePath to the bin FASTA file
Bin AccessionENA accession if submitted
CompletenessCheckM completeness score (0–100%)
ContaminationCheckM contamination score (0–100%)
SampleThe sample this bin belongs to
Pipeline RunThe run that produced this bin

Quality Metrics

Bin quality is assessed by CheckM, which estimates:

  • Completeness — what percentage of a complete genome is present
  • Contamination — what percentage of the bin is from other organisms

Common quality thresholds:

CategoryCompletenessContamination
High quality≥ 90%< 5%
Medium quality≥ 50%< 10%
Low quality< 50%any

Bin Sources

The MAG pipeline can produce bins from multiple binning tools:

  1. MetaBAT2 — default binning
  2. MaxBin2 — alternative binning
  3. CONCOCT — optional (disabled by default)
  4. DAS Tool — bin refinement, combines results from other tools

DAS Tool refined bins (at GenomeBinning/DASTool/bins/) are preferred.

Pipeline Artifacts

Beyond assemblies and bins, pipeline runs produce additional artifacts:

TypeDescription
readsProcessed/filtered reads
assemblyAssembled contigs
binsGenome bins
qc_reportQuality control reports (MultiQC, FastQC)
alignmentBAM alignment files

Each artifact tracks:

  • File path, size, and checksum
  • Which pipeline step produced it
  • Associated study and sample
  • Tool-specific metadata (JSON)

Read Outputs

Order pipelines can also produce read-level results. SeqDesk distinguishes metadata updates from new read files:

Result kindBehavior
sample_read_metadataUpdates the active read record with checksums, counts, QC metrics, or report links
sample_read_candidateStores new FASTQ files as run artifacts and marks them as pending review
sample_read_replaceCreates or supersedes canonical read records when the run completes

For read-cleaning and host-removal workflows, prefer sample_read_candidate with admin_review. This lets admins inspect MultiQC or other HTML reports before switching the order to cleaned reads. Raw and unknown source reads remain preserved; promoted cleaned reads become the active reads used for downstream order pipelines and delivery.

Pending Writebacks (Candidate Review)

When a pipeline stages sample_read_candidate outputs with the admin_review policy (as the Read Cleaning pipeline does), the cleaned FASTQ files are stored as run artifacts and held as pending writebacks rather than being applied automatically. A facility admin reviews the run’s reports, then chooses which per-sample candidates to promote into active reads.

Promotion is non-destructive: existing raw and unknown source reads are preserved, and an existing active cleaned read is superseded (not deleted) by the promoted candidate. The newly promoted reads become the active reads used for delivery and downstream order pipelines.

API

The review flow is backed by per-run endpoints (FACILITY_ADMIN only):

MethodEndpointPurpose
GET/api/pipelines/runs/[id]/pending-writebacksList discovered candidates and their current status for the run
POST/api/pipelines/runs/[id]/pending-writebacksPromote selected candidates (optional sampleIds body; omit to promote all)

Both require the FACILITY_ADMIN role; promotion is disabled in the public demo. The GET summary marks each candidate as candidate or promoted and returns the per-sample target read data class so the UI can show what will change before an admin confirms.

Reports and Preview Files

Pipeline packages should mark HTML, PDF, TSV, and text summaries as run_artifact, study_report, or order_report outputs. SeqDesk ranks these files on the run output page, shows the primary report first when one is available, and keeps the full output browser available for secondary files.

Assemblies Viewer

The Assemblies page (/assemblies) provides a centralized view of all genome assemblies across studies. The table shows:

ColumnDescription
StudyStudy title and associated order number
SampleSample identifier
Final AssemblyThe selected assembly with file path and pipeline run info
Selection ModeHow the final assembly was chosen (see below)
Available CountNumber of alternative assemblies for the sample
DownloadDownload the assembly FASTA file

Assembly Selection Modes

ModeMeaning
Marked FinalAdmin explicitly selected this assembly as preferred
AutomaticSystem selected the latest available assembly
Missing PreferredAdmin marked a preferred assembly, but it is no longer available
UnavailableNo assembly exists for this sample

By default, only facility admins can download assemblies. Enable the allowUserAssemblyDownload setting to let researchers download their own assembly files.

Viewing Results

Results are accessible from multiple places:

  • Assemblies page — centralized view of all assemblies across studies
  • Study page → Pipelines tab — overview of all runs and their results
  • Sample detail page — assemblies and bins for a specific sample
  • Pipeline run detail — all outputs from a single run