Adding Custom Pipelines
SeqDesk uses a modular pipeline package system. Each installed package lives in
pipelines/{pipeline-id}/ and combines runtime metadata, UI configuration,
generated input definitions, and optional discovery helpers.
Package Structure
pipelines/{pipeline-id}/
├── manifest.json # Runtime contract: targets, inputs, outputs, writeback
├── definition.json # Workflow DAG with step dependencies
├── registry.json # UI configuration and config schema
├── samplesheet.yaml # Generated input definitions
├── README.md # Documentation
└── scripts/
└── discover-outputs.mjs # Optional output discovery/writeback helpermanifest.json
The manifest is the runtime source of truth for what a package supports, what it reads, how it executes, what it produces, and which canonical records it may write back to.
{
"manifestVersion": 1,
"package": {
"id": "my-pipeline",
"name": "My Pipeline",
"version": "1.0.0",
"description": "Description shown in SeqDesk"
},
"targets": {
"supported": ["order"]
},
"inputs": [
{
"id": "reads",
"scope": "sample",
"source": "sample.reads",
"required": true
}
],
"execution": {
"type": "nextflow",
"pipeline": "./workflow",
"version": "1.0.0",
"profiles": ["conda"]
},
"outputs": [
{
"id": "sample_checksums",
"scope": "sample",
"destination": "sample_reads",
"type": "artifact",
"writeback": {
"target": "Read",
"mode": "merge",
"fields": {
"checksum1": "checksum1",
"checksum2": "checksum2"
}
},
"discovery": {
"pattern": "checksums/*.json",
"matchSampleBy": "filename"
}
}
]
}Important manifest responsibilities:
targets.supporteddeclares whether a package is meant forstudy,order, or bothinputs[].sourcedescribes which SeqDesk records the package consumesoutputs[].discoverytells SeqDesk how to locate produced filesoutputs[].resultdeclares what the output means to SeqDesk: artifact, report, metadata update, staged read candidate, or canonical replacementoutputs[].writebackdeclares safe canonical destinations such asRead- actual database writes are still validated and executed centrally by SeqDesk
definition.json
The definition describes the Nextflow workflow DAG — each process, its dependencies, and how steps connect:
{
"steps": [
{
"id": "classification",
"name": "Classify Contaminant Reads",
"category": "preprocessing",
"dependsOn": [],
"processMatchers": [".*KRAKEN2.*", ".*BBDUK.*"]
},
{
"id": "filter",
"name": "Filter Reads",
"category": "preprocessing",
"dependsOn": ["classification"],
"processMatchers": [".*FILTER.*"]
}
]
}Each step lists processMatchers (process-name strings or regex patterns), not
a processes array — these map live Nextflow process names back to a logical
step. This powers the DAG visualization in the monitoring UI. (Real example:
pipelines/read-cleaning/definition.json.)
registry.json
registry.json now focuses on presentation and configuration rather than being
the runtime source of truth for scope or writeback. It configures how the
package appears in the UI and which settings the user can edit before launch:
category is a lowercase enum (e.g. analysis, qc). Visibility lives under a
visibility object with showToUser / userCanStart (not canStart /
showToUsers), and configSchema is a JSON-Schema object (type: "object"
with a properties map; each property uses title/description/default,
optionally an enum, plus optional SeqDesk UI hints under x-seqdesk):
{
"id": "my-pipeline",
"name": "My Pipeline",
"description": "Description shown to users",
"category": "qc",
"version": "0.1.0",
"visibility": {
"showToUser": false,
"userCanStart": false
},
"configSchema": {
"type": "object",
"properties": {
"stubMode": {
"type": "boolean",
"title": "Stub Mode",
"description": "Run in test mode",
"default": false
},
"filteringTool": {
"type": "string",
"title": "Filtering Tool",
"enum": ["seqkit", "bbmap"],
"default": "seqkit"
}
}
}
}(Real examples: pipelines/read-cleaning/registry.json,
pipelines/mag/registry.json.)
samplesheet.yaml
Defines how SeqDesk generates the package input file from database records.
Everything nests under a top-level samplesheet: key with format, filename,
rows.scope, and a columns list. Each column draws from a source
(e.g. sample.sampleId, study.id, read.file1, order.platform), and may
declare required, a default, filters, and a transform (such as
prepend_path or map_value):
samplesheet:
format: csv
filename: samplesheet.csv
rows:
scope: sample
columns:
- name: sample
source: sample.sampleId
- name: group
source: study.id
description: Used for co-assembly grouping
- name: short_reads_1
source: read.file1
required: true
transform:
type: prepend_path
base: "${DATA_BASE_PATH}"
- name: short_reads_2
source: read.file2
required: false
transform:
type: prepend_path
base: "${DATA_BASE_PATH}"SeqDesk uses this definition to automatically generate the package input from
the selected study samples or order-linked reads. Some packages instead point
samplesheet.generator at a script (e.g.
scripts/generate-samplesheet.mjs, as read-cleaning does) when row
generation is more involved. (Real example: pipelines/mag/samplesheet.yaml.)
Output Discovery
Many packages can be resolved from declarative discovery patterns in the
manifest. For more complex cases, packages can also ship a discovery script such
as scripts/discover-outputs.mjs to match files back to samples and emit the
metadata keys referenced by writeback.
This keeps order pipelines modular while preserving central validation of actual database updates.
Result Contracts and Writeback
Every output should have two separate concepts:
destinationtells SeqDesk where the file or parsed data is stored.resulttells SeqDesk what the output means and whether it needs review.
Use result.kind for the user-facing and data-lifecycle behavior:
| Kind | Use for |
|---|---|
run_artifact | Reports, logs, tables, and downloadable files attached to a run |
sample_read_metadata | Checksums, QC metrics, or report links merged into active reads |
sample_read_candidate | New read files staged for admin review |
sample_read_replace | New read files that become canonical automatically |
sample_assembly / sample_bin | Study outputs linked to samples |
Use result.writebackPolicy to make destructive or review-sensitive behavior
explicit:
| Policy | Behavior |
|---|---|
none | No canonical data changes |
metadata_only | Update metadata on existing canonical records |
stage_only | Store candidates without promoting them |
admin_review | Require an admin to review and promote outputs |
promote_on_success | Create canonical records when the run completes |
replace_on_success | Replace or supersede canonical records when the run completes |
Read-cleaning pipelines should stage candidate reads instead of silently overwriting active reads:
{
"id": "cleaned_read_candidates",
"scope": "sample",
"destination": "run_artifact",
"type": "artifact",
"result": {
"kind": "sample_read_candidate",
"writebackPolicy": "admin_review",
"preview": {
"label": "Cleaned read candidate"
}
},
"discovery": {
"pattern": "filter/filtered/*_filtered.fastq.gz",
"matchSampleBy": "filename"
}
}SeqDesk stores those files as run artifacts, shows them as pending review, and only switches active reads after an admin promotes selected candidates.
Testing Pipeline Integrations
Before publishing a package, test the full path with small fixtures:
- Validate the descriptor with
npm run pipeline:validate. - Run samplesheet generation against dummy order or study data.
- Run output discovery against a tiny output fixture directory.
- Resolve discovered outputs and confirm the expected database writes.
- For read writeback, test both safe no-overwrite behavior and explicit promotion/replacement behavior.
Use dummy FASTQ files for fast contract tests. Add heavier synthetic contamination/reference fixtures later when validating the real classifier configuration.
Registration
New pipelines are automatically discovered from the pipelines/ directory.
SeqDesk loads the manifest and registry files together, derives pipeline scope
from the manifest, and exposes the package through the installed-pipeline list
and public registry metadata.
To enable a pipeline, set its enabled flag in the PipelineConfig database
table or through the admin settings.