Available Pipelines
SeqDesk ships with built-in support for study pipelines and order pipelines. Study pipelines run across the samples of a study. Order pipelines operate on linked sequencing files inside an order and are typically used for simulation, validation, QC, and read cleaning.
The installed-pipeline catalog is built only from packages discovered under
pipelines/. There are four built-in order pipelines: Simulate Reads,
FASTQ Checksum, FastQC, and Read Cleaning.
Study Pipelines
MAG (Metagenome-Assembled Genomes)
Pipeline: nf-core/mag v3.0.0 Purpose: Assembly and binning of metagenomes Input: Paired-end FASTQ reads Role required: FACILITY_ADMIN
What MAG Does
The MAG pipeline takes raw metagenomic sequencing reads and produces:
- Quality-controlled reads — trimmed and host-removed
- Assembled contigs — via MEGAHIT and/or SPAdes
- Genome bins — via MetaBAT2, MaxBin2, and/or CONCOCT
- Refined bins — via DAS Tool
- Quality scores — completeness and contamination via CheckM
- Taxonomy — classification via GTDB-Tk
- QC summary — MultiQC report
Configuration
| Parameter | Default | Description |
|---|---|---|
stubMode | false | Test mode (fast, no real analysis) |
skipMegahit | false | Skip MEGAHIT assembler |
skipSpades | true | Skip SPAdes assembler |
skipProkka | true | Skip Prokka gene annotation |
skipConcoct | true | Skip CONCOCT binning |
skipBinQc | false | Skip bin quality control |
skipGtdb | false | Skip GTDB-Tk taxonomy |
skipGunc | false | Skip GUNC contamination check |
gtdbDb | — | Path to GTDB-Tk database (optional) |
Outputs
| Output | Location | Description |
|---|---|---|
| Assemblies | Assembly/MEGAHIT/ | Per-sample contig files (.contigs.fa.gz) |
| Bins | GenomeBinning/DASTool/bins/ | Refined genome bins (.fa) |
| CheckM | GenomeBinning/QC/ | Completeness and contamination TSV |
| GTDB-Tk | Taxonomy/GTDB-Tk/ | Taxonomy classification TSV |
| MultiQC | multiqc/ | Aggregate QC HTML report |
Assemblies and bins are automatically parsed and linked to samples in the database after the run completes.
SubMG (ENA Submission Pipeline)
Pipeline: ttubb/submg v1.0.0 Purpose: Automated ENA submission of reads, assemblies, and bins Input: Samples with reads, assemblies, and optionally bins Role required: FACILITY_ADMIN
What SubMG Does
SubMG automates the submission of sequencing data to the European Nucleotide Archive (ENA). It handles the full submission workflow:
- Validate inputs — checks study/sample prerequisites and ENA credentials
- Generate config — creates SubMG YAML manifests and helper files
- Submit — executes
submg submitfor each manifest - Parse receipts — reads ENA responses and stores accession numbers
Prerequisites
Before running SubMG, ensure:
- The study has an ENA study accession (
PRJEB...) - Samples have taxonomy IDs assigned
- Reads are linked to samples
- ENA credentials are configured (see ENA Credentials)
Configuration
| Parameter | Default | Description |
|---|---|---|
skipChecks | true | Skip pre-submission validation |
submitBins | true | Include genome bins in the submission |
condaEnv | submg | Conda environment name |
assemblySoftware | MEGAHIT | Assembler used (for ENA metadata) |
completenessSoftware | CheckM | QC software used |
binningSoftware | MetaBAT2 | Binner used |
Outputs
After a successful run, accession numbers are stored back in the database:
- Sample accessions — ERS/SAMEA numbers
- Read accessions — ERX/ERR numbers
- Assembly accessions — linked to Assembly records
- Bin accessions — linked to Bin records
MetaxPath (Pathogen Profiling) — external / optional
MetaxPath is not shipped in the public package catalog. The
installed-pipeline catalog is built only from packages discovered under
pipelines/, and MetaxPath is not one of them. It is an
optional/private add-on package that an operator can install separately. The
configuration below documents that package for installs that have it.
Pipeline: hzi-bifo/MetaxPath-Nextflow v0.1.6 Purpose: Long-read clinical metagenomics for pathogen identification, virulence, and AMR detection Input: Long-read FASTQ files (Oxford Nanopore or PacBio) Role required: FACILITY_ADMIN
What MetaxPath Does
MetaxPath is designed for clinical metagenomics on long-read sequencing data. It performs:
- Human-read filtering — removes host contamination
- Taxonomic profiling — via Metax and Sylph
- Assembly — via Flye (or configurable assemblers)
- Virulence factor prediction — identifies pathogenic gene markers
- AMR detection — predicts antibiotic resistance genes
- Reporting — generates HTML reports and species abundance dotplots
Supported Sequencers
- Oxford Nanopore (MinION, GridION, PromethION)
- PacBio (Sequel, Revio)
Configuration
| Parameter | Default | Description |
|---|---|---|
sequencer | Nanopore | Sequencing platform (Nanopore or PacBio) |
assemblers | flye | Comma-separated assembler list |
threads | 20 | CPU threads per process |
topn | 50 | Number of top species in reports |
skipSylph | false | Skip Sylph profiling |
skipVirulence | false | Skip virulence factor prediction |
skipAmr | false | Skip AMR detection |
Database paths (must be configured by admin):
| Parameter | Description |
|---|---|
metaxDb | Metax database prefix (without .json) |
metaxDmpDir | NCBI taxonomy dump directory |
kraken2Db | Kraken2 database path |
sylphDb | Sylph database path |
refIndex | Host reference minimap2 index |
Outputs
| Output | Scope | Description |
|---|---|---|
| Profile with VFs/AMRs | Per sample | Merged taxonomic profile with virulence and AMR annotations |
| Top-N HTML report | Per study | Combined species abundance report |
| Readcount stats | Per study | Combined readcount summary |
| Dotplots | Per study | Species abundance visualizations (PDF) |
Reads QC (Quality Overview)
Pipeline: reads-qc v0.1.0
Purpose: Per-sample FASTQ statistics with an HTML summary report
Input: Linked sample reads (any scope; runs at study level)
Role required: FACILITY_ADMIN
Reads QC computes read count, base count, average quality, and GC content for each sample’s FASTQ files and rolls them up into a study-level HTML overview. It’s a lighter alternative to per-sample FastQC when all you need is a quick comparison across the samples in a study. macOS ARM local runs are supported.
Main outputs:
- Per-sample read statistics (counts, bases, quality, GC%)
- Study-level HTML summary report
- Study-level TSV with per-sample metrics
Study Demo Report
Pipeline: study-demo-report v0.1.0
Purpose: Deterministic HTML, Markdown, and TSV outputs for testing pipeline integration
Input: Study + samples
Role required: FACILITY_ADMIN
Study Demo Report is a smoke-test pipeline that produces deterministic outputs without any real bioinformatics work. Use it to verify that pipeline execution, weblog ingestion, output parsing, and the Assemblies/Results UI all hang together end-to-end — without burning CPU on real analysis. Useful in CI and as a first run after configuring a new install. macOS ARM local runs are supported.
Main outputs:
- Study-scope HTML report
- Markdown summary
- TSV per-sample table
Order Pipelines
Simulate Reads
Pipeline: simulate-reads v0.2.0
Purpose: Generate dummy FASTQ files for selected order samples
Input: Order samples
Role required: FACILITY_ADMIN
Simulate Reads generates synthetic FASTQ files and links them back to canonical
Read records. It is mainly useful for demos, smoke tests, and exercising
downstream order-scoped QC workflows.
Main outputs:
- Generated FASTQ files linked to
Readrecords - Read counts written back to canonical read fields
- Run-level simulation summary TSV
FASTQ Checksum
Pipeline: fastq-checksum v0.1.0
Purpose: Compute MD5 checksums for linked FASTQ files
Input: Linked order FASTQ files
Role required: FACILITY_ADMIN
FASTQ Checksum computes canonical MD5 checksums for linked read files and
stores them back on the corresponding Read records for downstream validation
and submission workflows.
Main outputs:
checksum1/checksum2onReadrecords- Run-level checksum summary TSV
FastQC
Pipeline: fastqc v0.1.0
Purpose: Run read quality control on linked FASTQ files
Input: Linked order FASTQ files
Role required: FACILITY_ADMIN
FastQC runs per-sample QC against linked order reads, publishes HTML reports
and zip archives, and stores selected summary metrics back onto the canonical
Read record.
Main outputs:
- Per-sample FastQC HTML reports
- Per-sample FastQC zip archives
- Read counts and average quality metrics on
Readrecords - Run-level FastQC summary TSV
Read Cleaning
Pipeline: read-cleaning v0.1.0 (wraps nf-core/detaxizer)
Purpose: Screen raw or unknown FASTQ reads for host/contaminant sequences and stage cleaned reads for admin review
Input: Active order reads marked raw or unknown (single or paired)
Role required: FACILITY_ADMIN
Read Cleaning runs nf-core/detaxizer to identify contaminant reads (by default
Homo sapiens), filters them out, and writes per-sample cleaned FASTQ files.
The cleaned files are staged as candidate reads (run artifacts) for an
admin to review and promote — raw and unknown source reads are never silently
overwritten. The pipeline is admin-only and not shown to researchers by
default (visibility.showToUser: false, userCanStart: false).
Configuration:
| Parameter | Default | Description |
|---|---|---|
tax2filter | Homo sapiens | Taxon name/ID passed to detaxizer |
classificationKraken2 | true | Use Kraken2 to identify contaminant reads |
kraken2Db | — | Local Kraken2 DB path or approved reference URI (required when Kraken2 is enabled) |
classificationBbduk | false | Use BBDuk k-mer matching against a contaminant FASTA |
bbdukReference | — | Local contaminant reference FASTA for BBDuk |
filteringTool | seqkit | Read filtering backend (seqkit or bbmap) |
readType | auto | Map single files to short- or long-read detaxizer columns (auto/short/long) |
outputRemovedReads | false | Also write removed contaminant reads to the output folder |
Main outputs:
- Per-sample cleaned-read candidates staged for admin review (see Adding Custom Pipelines and Results)
- MultiQC HTML report (classification/filtering evidence)
- Run-level detaxizer classification/filtering summary TSV
Cleaned reads become the active reads used for downstream order pipelines only after an admin promotes the candidates via the pending-writebacks review flow.
Adding More Pipelines
SeqDesk supports adding custom pipelines through a package structure. See Adding Custom Pipelines for details on creating your own pipeline integrations.