Skip to Content
Pipelines & AnalysisAdding Custom Pipelines

Adding Custom Pipelines

SeqDesk uses a modular pipeline package system. Each installed package lives in pipelines/{pipeline-id}/ and combines runtime metadata, UI configuration, generated input definitions, and optional discovery helpers.

Package Structure

pipelines/{pipeline-id}/ ├── manifest.json # Runtime contract: targets, inputs, outputs, writeback ├── definition.json # Workflow DAG with step dependencies ├── registry.json # UI configuration and config schema ├── samplesheet.yaml # Generated input definitions ├── README.md # Documentation └── scripts/ └── discover-outputs.mjs # Optional output discovery/writeback helper

manifest.json

The manifest is the runtime source of truth for what a package supports, what it reads, how it executes, what it produces, and which canonical records it may write back to.

{ "manifestVersion": 1, "package": { "id": "my-pipeline", "name": "My Pipeline", "version": "1.0.0", "description": "Description shown in SeqDesk" }, "targets": { "supported": ["order"] }, "inputs": [ { "id": "reads", "scope": "sample", "source": "sample.reads", "required": true } ], "execution": { "type": "nextflow", "pipeline": "./workflow", "version": "1.0.0", "profiles": ["conda"] }, "outputs": [ { "id": "sample_checksums", "scope": "sample", "destination": "sample_reads", "type": "artifact", "writeback": { "target": "Read", "mode": "merge", "fields": { "checksum1": "checksum1", "checksum2": "checksum2" } }, "discovery": { "pattern": "checksums/*.json", "matchSampleBy": "filename" } } ] }

Important manifest responsibilities:

  • targets.supported declares whether a package is meant for study, order, or both
  • inputs[].source describes which SeqDesk records the package consumes
  • outputs[].discovery tells SeqDesk how to locate produced files
  • outputs[].result declares what the output means to SeqDesk: artifact, report, metadata update, staged read candidate, or canonical replacement
  • outputs[].writeback declares safe canonical destinations such as Read
  • actual database writes are still validated and executed centrally by SeqDesk

definition.json

The definition describes the Nextflow workflow DAG — each process, its dependencies, and how steps connect:

{ "steps": [ { "id": "classification", "name": "Classify Contaminant Reads", "category": "preprocessing", "dependsOn": [], "processMatchers": [".*KRAKEN2.*", ".*BBDUK.*"] }, { "id": "filter", "name": "Filter Reads", "category": "preprocessing", "dependsOn": ["classification"], "processMatchers": [".*FILTER.*"] } ] }

Each step lists processMatchers (process-name strings or regex patterns), not a processes array — these map live Nextflow process names back to a logical step. This powers the DAG visualization in the monitoring UI. (Real example: pipelines/read-cleaning/definition.json.)

registry.json

registry.json now focuses on presentation and configuration rather than being the runtime source of truth for scope or writeback. It configures how the package appears in the UI and which settings the user can edit before launch:

category is a lowercase enum (e.g. analysis, qc). Visibility lives under a visibility object with showToUser / userCanStart (not canStart / showToUsers), and configSchema is a JSON-Schema object (type: "object" with a properties map; each property uses title/description/default, optionally an enum, plus optional SeqDesk UI hints under x-seqdesk):

{ "id": "my-pipeline", "name": "My Pipeline", "description": "Description shown to users", "category": "qc", "version": "0.1.0", "visibility": { "showToUser": false, "userCanStart": false }, "configSchema": { "type": "object", "properties": { "stubMode": { "type": "boolean", "title": "Stub Mode", "description": "Run in test mode", "default": false }, "filteringTool": { "type": "string", "title": "Filtering Tool", "enum": ["seqkit", "bbmap"], "default": "seqkit" } } } }

(Real examples: pipelines/read-cleaning/registry.json, pipelines/mag/registry.json.)

samplesheet.yaml

Defines how SeqDesk generates the package input file from database records. Everything nests under a top-level samplesheet: key with format, filename, rows.scope, and a columns list. Each column draws from a source (e.g. sample.sampleId, study.id, read.file1, order.platform), and may declare required, a default, filters, and a transform (such as prepend_path or map_value):

samplesheet: format: csv filename: samplesheet.csv rows: scope: sample columns: - name: sample source: sample.sampleId - name: group source: study.id description: Used for co-assembly grouping - name: short_reads_1 source: read.file1 required: true transform: type: prepend_path base: "${DATA_BASE_PATH}" - name: short_reads_2 source: read.file2 required: false transform: type: prepend_path base: "${DATA_BASE_PATH}"

SeqDesk uses this definition to automatically generate the package input from the selected study samples or order-linked reads. Some packages instead point samplesheet.generator at a script (e.g. scripts/generate-samplesheet.mjs, as read-cleaning does) when row generation is more involved. (Real example: pipelines/mag/samplesheet.yaml.)

Output Discovery

Many packages can be resolved from declarative discovery patterns in the manifest. For more complex cases, packages can also ship a discovery script such as scripts/discover-outputs.mjs to match files back to samples and emit the metadata keys referenced by writeback.

This keeps order pipelines modular while preserving central validation of actual database updates.

Result Contracts and Writeback

Every output should have two separate concepts:

  • destination tells SeqDesk where the file or parsed data is stored.
  • result tells SeqDesk what the output means and whether it needs review.

Use result.kind for the user-facing and data-lifecycle behavior:

KindUse for
run_artifactReports, logs, tables, and downloadable files attached to a run
sample_read_metadataChecksums, QC metrics, or report links merged into active reads
sample_read_candidateNew read files staged for admin review
sample_read_replaceNew read files that become canonical automatically
sample_assembly / sample_binStudy outputs linked to samples

Use result.writebackPolicy to make destructive or review-sensitive behavior explicit:

PolicyBehavior
noneNo canonical data changes
metadata_onlyUpdate metadata on existing canonical records
stage_onlyStore candidates without promoting them
admin_reviewRequire an admin to review and promote outputs
promote_on_successCreate canonical records when the run completes
replace_on_successReplace or supersede canonical records when the run completes

Read-cleaning pipelines should stage candidate reads instead of silently overwriting active reads:

{ "id": "cleaned_read_candidates", "scope": "sample", "destination": "run_artifact", "type": "artifact", "result": { "kind": "sample_read_candidate", "writebackPolicy": "admin_review", "preview": { "label": "Cleaned read candidate" } }, "discovery": { "pattern": "filter/filtered/*_filtered.fastq.gz", "matchSampleBy": "filename" } }

SeqDesk stores those files as run artifacts, shows them as pending review, and only switches active reads after an admin promotes selected candidates.

Testing Pipeline Integrations

Before publishing a package, test the full path with small fixtures:

  1. Validate the descriptor with npm run pipeline:validate.
  2. Run samplesheet generation against dummy order or study data.
  3. Run output discovery against a tiny output fixture directory.
  4. Resolve discovered outputs and confirm the expected database writes.
  5. For read writeback, test both safe no-overwrite behavior and explicit promotion/replacement behavior.

Use dummy FASTQ files for fast contract tests. Add heavier synthetic contamination/reference fixtures later when validating the real classifier configuration.

Registration

New pipelines are automatically discovered from the pipelines/ directory. SeqDesk loads the manifest and registry files together, derives pipeline scope from the manifest, and exposes the package through the installed-pipeline list and public registry metadata.

To enable a pipeline, set its enabled flag in the PipelineConfig database table or through the admin settings.