Adding Custom Pipelines

SeqDesk uses a modular pipeline package system. Each installed package lives in pipelines/{pipeline-id}/ and combines runtime metadata, UI configuration, generated input definitions, and optional discovery helpers.

Lifecycle: From Manifest to Writeback

A pipeline follows the same path whether it is built in or contributed. Each step links to the page that covers it in depth, so this is a map rather than a repeat:

Author the package in pipelines/<id>/ (manifest.json, definition.json, registry.json, samplesheet.yaml, plus README.md and optional scripts//parsers/). Start from the pipelines/_example/ scaffold.
Validate the descriptor with npm run pipeline:validate pipelines/<id> (see Testing Pipeline Integrations).
Discovery and scope — SeqDesk auto-discovers packages under pipelines/ and derives Study versus Sequencing Order scope from manifest.targets.supported (see Registration and Pipeline Scopes).
Input generation — samplesheet.yaml builds the Nextflow input file from canonical records (see Running Pipelines).
Execution — local or on a Slurm cluster, in both cases via Nextflow (see Running Pipelines).
Monitoring — live progress via the Nextflow weblog and trace (see Monitoring).
Output discovery and writeback — produced files are matched back to samples, and database writes are validated centrally, with review-sensitive changes staged (see Results).

Worked example: a minimal pipeline

The smallest real package in the repo is pipelines/fastq-checksum/. It computes MD5 checksums for the FASTQ files already linked to an order’s samples. Because it takes no user-facing parameters (its configSchema.properties is empty), it is the cleanest illustration of the wiring every package needs. Read it top to bottom, then copy it.

`manifest.json` — the source of truth

The manifest is what the executor reads. Everything else (registry, definition, samplesheet) is wiring the manifest points at.


{
  "manifestVersion": 1,
  "package": {
    "id": "fastq-checksum",
    "name": "FASTQ Checksum",
    "version": "0.1.0",
    "description": "Compute MD5 checksums for linked FASTQ files in an order",
    "provider": "SeqDesk"
  },
  "files": {
    "definition": "definition.json",
    "registry": "registry.json",
    "samplesheet": "samplesheet.yaml",
    "readme": "README.md",
    "scripts": { "discoverOutputs": "scripts/discover-outputs.mjs" }
  },
  "targets": { "supported": ["order"] },
  "inputs": [
    { "id": "reads", "scope": "sample", "source": "sample.reads", "required": true }
  ],
  "execution": {
    "type": "nextflow",
    "pipeline": "./workflow",
    "version": "0.1.0",
    "profiles": ["conda"],
    "runtime": { "allowMacOsArmConda": true },
    "defaultParams": {}
  },
  "outputs": [
    {
      "id": "sample_checksums",
      "scope": "sample",
      "destination": "sample_reads",
      "type": "artifact",
      "fromStep": "checksum",
      "writeback": {
        "target": "Read",
        "mode": "merge",
        "fields": { "checksum1": "checksum1", "checksum2": "checksum2" }
      },
      "discovery": { "pattern": "checksums/*.json", "matchSampleBy": "filename" }
    },
    {
      "id": "summary",
      "scope": "run",
      "destination": "run_artifact",
      "type": "artifact",
      "fromStep": "checksum",
      "discovery": { "pattern": "summary/checksum-summary.tsv" }
    }
  ],
  "schema_requirements": { "tables": ["Read", "PipelineRun", "PipelineArtifact"] }
  // an optional "ui" block (result-table formatting) is also present; trimmed here
}

Why each field is there:

package.id must equal the folder name (fastq-checksum); the validator checks this. version and description are surfaced in the UI.
files points at the sibling files the loader must read. scripts.discoverOutputs names the package-local script that maps produced files back to the declared outputs.
targets.supported: ["order"] declares this package runs against a Sequencing Order. The allowed targets are study and order.
inputs[].source: "sample.reads" with required: true means the package needs Read records on every selected sample; the executor refuses to start a sample that has none.
execution.pipeline: "./workflow" runs the bundled local Nextflow workflow folder (an nf-core package would put e.g. "nf-core/detaxizer" here instead). profiles: ["conda"] is the Nextflow profile. allowMacOsArmConda: true permits the conda profile on Apple-Silicon dev machines. defaultParams: {} is empty because this pipeline has no parameters.
outputs[]: each entry is found via discovery.pattern (a glob, relative to the run output dir) and tied to the producing step via fromStep (which must match a definition.steps[].id). matchSampleBy: "filename" associates a found file with a sample by the sample ID in its name.
writeback is what makes the checksum land in the database: target: "Read", mode: "merge", and fields mapping the discovered JSON keys onto the canonical Read columns of the same name.
schema_requirements.tables documents the DB tables this package touches.

`registry.json` — how it shows up in the UI


{
  "id": "fastq-checksum",
  "name": "FASTQ Checksum",
  "category": "qc",
  "version": "0.1.0",
  "sortOrder": 2,
  "requires": {
    "reads": true, "assemblies": false, "bins": false,
    "checksums": false, "studyAccession": false, "sampleMetadata": false
  },
  "visibility": { "showToUser": false, "userCanStart": false },
  "input": {
    "supportedScopes": ["order"],
    "minSamples": 1,
    "perSample": { "reads": true, "readMode": "single_or_paired", "pairedEnd": false }
  },
  "samplesheet": { "format": "csv", "generator": "samplesheet.yaml" },
  "configSchema": { "type": "object", "properties": {} },
  "defaultConfig": {},
  "icon": "Hash"
  // an "outputs" array (UI output descriptions) is also present; trimmed here
}

requires gates which orders can run it: reads: true hides it for orders with no reads. visibility.showToUser: false / userCanStart: false keep this an admin-only internal step.
input.minSamples: 1 and perSample.reads: true restate the data precondition for the picker.
samplesheet.generator: "samplesheet.yaml" says the samplesheet is built from the declarative YAML rather than a custom script.
configSchema.properties: {} — empty, so this pipeline renders no settings form. (See Settings and parameter mapping by example for a package that does.)
icon: "Hash" is the lucide icon name shown in the UI.

`samplesheet.yaml` — turning DB rows into the Nextflow input


samplesheet:
  format: csv
  filename: samplesheet.csv
  rows:
    scope: sample
  columns:
    - name: sample_id
      source: sample.sampleId
      required: true
    - name: fastq_1
      source: read.file1
      required: true
      transform:
        type: prepend_path
        base: '${DATA_BASE_PATH}'
    - name: fastq_2
      source: read.file2
      required: false
      default: ""
      transform:
        type: prepend_path
        base: '${DATA_BASE_PATH}'

One CSV row per sample (rows.scope: sample). Each column pulls a value from a sample.* or read.* field. The prepend_path transform with base: '${DATA_BASE_PATH}' rewrites the stored relative path into an absolute path the Nextflow process can open. fastq_2 is required: false with default: "" so single-end samples still produce a valid row.

`definition.json` — the step DAG (trimmed)


{
  "pipeline": "fastq-checksum",
  "name": "FASTQ Checksum",
  "version": "0.1.0",
  "steps": [
    {
      "id": "checksum",
      "name": "Calculate Checksums",
      "category": "qc",
      "dependsOn": [],
      "processMatchers": ["CALCULATE_CHECKSUMS", "SUMMARIZE_CHECKSUMS"],
      "tools": ["md5sum"],
      "outputs": ["checksum-json", "checksum-summary"]
    }
  ]
}

The single step’s id: "checksum" is exactly the value the manifest’s outputs referenced as fromStep. processMatchers map live Nextflow process names to this step so the UI can show progress; prefer exact names and use regex only for variable nf-core process names.

Validate before you open a PR

From the repo root:


npm run pipeline:validate pipelines/fastq-checksum

This runs scripts/validate-pipeline-package.ts, which checks manifest schema compliance, that the folder name matches package.id, that every referenced file exists, and that each fromStep points at a real definition.steps[].id. Fix any reported error before pushing (see Troubleshooting).

Package Structure


pipelines/{pipeline-id}/
├── manifest.json         # Runtime contract: targets, inputs, outputs, writeback
├── definition.json       # Workflow DAG with step dependencies
├── registry.json         # UI configuration and config schema
├── samplesheet.yaml      # Generated input definitions
├── README.md             # Documentation
└── scripts/
    └── discover-outputs.mjs  # Optional output discovery/writeback helper

manifest.json

The manifest is the runtime source of truth for what a package supports, what it reads, how it executes, what it produces, and which canonical records it may write back to.


{
  "manifestVersion": 1,
  "package": {
    "id": "my-pipeline",
    "name": "My Pipeline",
    "version": "1.0.0",
    "description": "Description shown in SeqDesk"
  },
  "targets": {
    "supported": ["order"]
  },
  "inputs": [
    {
      "id": "reads",
      "scope": "sample",
      "source": "sample.reads",
      "required": true
    }
  ],
  "execution": {
    "type": "nextflow",
    "pipeline": "./workflow",
    "version": "1.0.0",
    "profiles": ["conda"]
  },
  "outputs": [
    {
      "id": "sample_checksums",
      "scope": "sample",
      "destination": "sample_reads",
      "type": "artifact",
      "writeback": {
        "target": "Read",
        "mode": "merge",
        "fields": {
          "checksum1": "checksum1",
          "checksum2": "checksum2"
        }
      },
      "discovery": {
        "pattern": "checksums/*.json",
        "matchSampleBy": "filename"
      }
    }
  ]
}

Important manifest responsibilities:

targets.supported declares whether a package is meant for study, order, or both
inputs[].source describes which SeqDesk records the package consumes
outputs[].discovery tells SeqDesk how to locate produced files
outputs[].result declares what the output means to SeqDesk: artifact, report, metadata update, staged read candidate, or canonical replacement
outputs[].writeback declares safe canonical destinations such as Read
actual database writes are still validated and executed centrally by SeqDesk

definition.json

The definition describes the Nextflow workflow DAG — each process, its dependencies, and how steps connect:


{
  "steps": [
    {
      "id": "classification",
      "name": "Classify Contaminant Reads",
      "category": "preprocessing",
      "dependsOn": [],
      "processMatchers": [".*KRAKEN2.*", ".*BBDUK.*"]
    },
    {
      "id": "filter",
      "name": "Filter Reads",
      "category": "preprocessing",
      "dependsOn": ["classification"],
      "processMatchers": [".*FILTER.*"]
    }
  ]
}

Each step lists processMatchers (process-name strings or regex patterns), not a processes array — these map live Nextflow process names back to a logical step. This powers the DAG visualization in the monitoring UI. (Real example: pipelines/read-cleaning/definition.json.)

registry.json

registry.json now focuses on presentation and configuration rather than being the runtime source of truth for scope or writeback. It configures how the package appears in the UI and which settings the user can edit before launch:

category is a lowercase enum (e.g. analysis, qc). Visibility lives under a visibility object with showToUser / userCanStart (not canStart / showToUsers), and configSchema is a JSON-Schema object (type: "object" with a properties map; each property uses title/description/default, optionally an enum, plus optional SeqDesk UI hints under x-seqdesk):


{
  "id": "my-pipeline",
  "name": "My Pipeline",
  "description": "Description shown to users",
  "category": "qc",
  "version": "0.1.0",
  "visibility": {
    "showToUser": false,
    "userCanStart": false
  },
  "configSchema": {
    "type": "object",
    "properties": {
      "stubMode": {
        "type": "boolean",
        "title": "Stub Mode",
        "description": "Run in test mode",
        "default": false
      },
      "filteringTool": {
        "type": "string",
        "title": "Filtering Tool",
        "enum": ["seqkit", "bbmap"],
        "default": "seqkit"
      }
    }
  }
}

(Real examples: pipelines/read-cleaning/registry.json, pipelines/mag/registry.json.)

samplesheet.yaml

Defines how SeqDesk generates the package input file from database records. Everything nests under a top-level samplesheet: key with format, filename, rows.scope, and a columns list. Each column draws from a source (e.g. sample.sampleId, study.id, read.file1, order.platform), and may declare required, a default, filters, and a transform (such as prepend_path or map_value):


samplesheet:
  format: csv
  filename: samplesheet.csv
  rows:
    scope: sample
  columns:
    - name: sample
      source: sample.sampleId
    - name: group
      source: study.id
      description: Used for co-assembly grouping
    - name: short_reads_1
      source: read.file1
      required: true
      transform:
        type: prepend_path
        base: "${DATA_BASE_PATH}"
    - name: short_reads_2
      source: read.file2
      required: false
      transform:
        type: prepend_path
        base: "${DATA_BASE_PATH}"

SeqDesk uses this definition to automatically generate the package input from the selected study samples or order-linked reads. Some packages instead point samplesheet.generator at a script (e.g. scripts/generate-samplesheet.mjs, as read-cleaning does) when row generation is more involved. (Real example: pipelines/mag/samplesheet.yaml.)

Settings and parameter mapping by example

pipelines/read-cleaning/ (an nf-core/detaxizer wrapper) shows how a UI setting becomes a Nextflow flag. Two files cooperate:

registry.json → configSchema declares the settings form (a JSON Schema with SeqDesk x-seqdesk hints).
manifest.json → execution.paramMap translates each setting key into the Nextflow flag the pipeline expects.

Step 1 — declare the setting (`registry.json`)


{
  "configSchema": {
    "type": "object",
    "properties": {
      "tax2filter": {
        "type": "string",
        "title": "Taxon to Filter",
        "description": "Taxon name or ID passed to detaxizer. Default targets Homo sapiens.",
        "default": "Homo sapiens",
        "x-seqdesk": {
          "placement": "basic",
          "group": "analysis",
          "helpText": "The contaminant taxon must be represented in the configured classifier reference."
        }
      },
      "classificationKraken2": {
        "type": "boolean",
        "title": "Kraken2 Classification",
        "description": "Use Kraken2 to identify contaminant reads.",
        "default": true,
        "x-seqdesk": { "placement": "basic", "group": "analysis" }
      },
      "kraken2Db": {
        "type": "string",
        "title": "Kraken2 DB",
        "description": "Local Kraken2 database path or an explicitly approved reference URI.",
        "default": "",
        "x-seqdesk": {
          "placement": "basic",
          "group": "databases",
          "helpText": "Required when Kraken2 classification is enabled. SeqDesk will not silently choose or download a database.",
          "hideWhenServerConfigured": true
        }
      }
      // … advanced settings (classificationBbduk, bbdukReference,
      // filteringTool, readType, outputRemovedReads) omitted for brevity
    }
  },
  "defaultConfig": {
    "tax2filter": "Homo sapiens",
    "classificationKraken2": true,
    "kraken2Db": ""
    // … remaining defaults omitted
  }
}

Notes:

x-seqdesk.placement (basic vs advanced) controls which tab the field renders in; group (analysis, databases, reporting) clusters fields.
A reference-database key like kraken2Db is just a string setting whose value is a path/URI. hideWhenServerConfigured: true hides the field when the server already provides the DB, so end users do not have to supply it.
Every property in defaultConfig is the value used when the user does not change it.

Step 2 — map the key to a Nextflow flag (`manifest.json`)


{
  "execution": {
    "defaultParams": {
      "enableFilter": true,
      "tax2filter": "Homo sapiens",
      "classificationKraken2": true,
      "classificationBbduk": false,
      "filteringTool": "seqkit",
      "outputRemovedReads": false
    },
    "paramMap": {
      "enableFilter": "--enable_filter",
      "tax2filter": "--tax2filter",
      "classificationKraken2": "--classification_kraken2",
      "classificationBbduk": "--classification_bbduk",
      "kraken2Db": "--kraken2db",
      "bbdukReference": "--fasta_bbduk",
      "filteringTool": "--filtering_tool",
      "outputRemovedReads": "--output_removed_reads",
      "readType": ""
    }
  }
}

Step 3 — what the executor produces

The executor merges defaultParams with the user’s config, then applies paramMap. Three representative cases:

UI setting (key)	User value	Nextflow flag emitted
`tax2filter` (string)	`"Homo sapiens"`	`--tax2filter 'Homo sapiens'`
`classificationKraken2` (boolean)	`true`	`--classification_kraken2`
`kraken2Db` (db path)	`/data/kraken2/standard`	`--kraken2db /data/kraken2/standard`

Mapping rules to remember:

A boolean true adds the bare flag (no value); a string adds the flag followed by its shell-escaped value (a value with spaces is single-quoted).
A blank string (the kraken2Db default of "") emits nothing — the flag is skipped. That is why the package README requires a real kraken2Db value whenever Kraken2 is enabled: SeqDesk never substitutes a database for you.
"readType": "" is the empty-string mapping idiom: it marks readType as a SeqDesk-only setting (used by the samplesheet generator) that must never be passed to Nextflow. The executor drops it instead of forwarding a flag.

Conditional flags with `paramRules`

read-cleaning uses only paramMap. When one setting must imply extra flags, add an optional execution.paramRules array. This real example is from pipelines/mag/manifest.json — enabling “skip bin QC” forces two downstream skips (the file has a second, analogous rule for skipBusco, omitted here):


{
  "execution": {
    "paramRules": [
      {
        "when": { "skipBinQc": true },
        "add": ["--skip_quast", "--skip_gtdbtk"]
      }
    ]
  }
}

Each rule fires when every key in when matches the merged config; each entry in add is either a bare flag string (as above) or an object { "flag": "--x", "value": "y" } that emits --x y. Rules run after paramMap, and exact duplicate flags are de-duplicated, so it is safe for a rule to re-assert a flag paramMap already set.

Output Discovery

Many packages can be resolved from declarative discovery patterns in the manifest. For more complex cases, packages can also ship a discovery script such as scripts/discover-outputs.mjs to match files back to samples and emit the metadata keys referenced by writeback.

This keeps order pipelines modular while preserving central validation of actual database updates.

Result Contracts and Writeback

Every output should have two separate concepts:

destination tells SeqDesk where the file or parsed data is stored.
result tells SeqDesk what the output means and whether it needs review.

Use result.kind for the user-facing and data-lifecycle behavior:

Kind	Use for
`run_artifact`	Reports, logs, tables, and downloadable files attached to a run
`sample_read_metadata`	Checksums, QC metrics, or report links merged into active reads
`sample_read_candidate`	New read files staged for admin review
`sample_read_replace`	New read files that become canonical automatically
`sample_assembly` / `sample_bin`	Study outputs linked to samples

Use result.writebackPolicy to make destructive or review-sensitive behavior explicit:

Policy	Behavior
`none`	No canonical data changes
`metadata_only`	Update metadata on existing canonical records
`stage_only`	Store candidates without promoting them
`admin_review`	Require an admin to review and promote outputs
`promote_on_success`	Create canonical records when the run completes
`replace_on_success`	Replace or supersede canonical records when the run completes

Read-cleaning pipelines should stage candidate reads instead of silently overwriting active reads:


{
  "id": "cleaned_read_candidates",
  "scope": "sample",
  "destination": "run_artifact",
  "type": "artifact",
  "result": {
    "kind": "sample_read_candidate",
    "writebackPolicy": "admin_review",
    "preview": {
      "label": "Cleaned read candidate"
    }
  },
  "discovery": {
    "pattern": "filter/filtered/*_filtered.fastq.gz",
    "matchSampleBy": "filename"
  }
}

SeqDesk stores those files as run artifacts, shows them as pending review, and only switches active reads after an admin promotes selected candidates.

Testing Pipeline Integrations

Before publishing a package, test the full path with small fixtures:

Validate the descriptor with npm run pipeline:validate.
Run samplesheet generation against dummy order or study data.
Run output discovery against a tiny output fixture directory.
Resolve discovered outputs and confirm the expected database writes.
For read writeback, test both safe no-overwrite behavior and explicit promotion/replacement behavior.

Use dummy FASTQ files for fast contract tests. Add heavier synthetic contamination/reference fixtures later when validating the real classifier configuration.

Automated integration tests (private CI mirror)

For security, the SeqDesk-to-pipeline integration is tested continuously on a private mirror of the public repository (hzi-bifo/SeqDesk-ci). A “Mirror to private CI” workflow pushes every change on main to that mirror, where the heavier integration suites run on a self-hosted runner — jobs guarded by if: github.repository == 'hzi-bifo/SeqDesk-ci', so they never run on the public repo and the self-hosted Slurm runner stays off GitHub’s public surface.

On a real AlmaLinux Slurm cluster, the end-to-end suite boots the freshly built application and submits pipelines in both local and Slurm execution modes, exercising real SBATCH generation plus the failure, cancel, and no-data paths. A daily install suite then installs the app and runs the heavier production pipelines (metaxpath, nf-core/mag, read-cleaning) end to end on real data, and ENA checks prove the submission pipeline can submit to the ENA test server and write back a real accession. The mirror is needed because these tests depend on things that must stay private: optional private pipeline add-ons (such as metaxpath), ENA Webin and hosted-profile credentials supplied as CI secrets, large reference databases staged on the cluster’s shared filesystem, and the self-hosted runner itself.

The push-triggered order and study pipeline checks validate every mirrored change, while the heavier Slurm and install suites run on a daily schedule (plus on demand). Built-in packages get this coverage automatically; a contributed pipeline gains it once it is accepted into the catalog and ships the dummy test data described below.

Troubleshooting

Validation runs through scripts/validate-pipeline-package.ts, which lints every package directory. The script prints one line per issue:


[ERROR] <packageId>: <message> (<file>)
[WARN]  <packageId>: <message> (<file>)
Checked N package(s): E error(s), W warning(s)

Exit behavior: the script exits 1 only when there is at least one error. Warnings never fail the check — they are advisory, so a green run can still print [WARN] lines.

Manifest errors short-circuit: if manifest.json is missing, is not valid JSON, or fails the schema, the linter returns immediately and you will not see the later checks until that is fixed. Fix the manifest first, then re-run.

Errors (these fail validation)

Message	Cause	Fix
`package.id "<x>" does not match expected package id "<dir>".`	The `package.id` in `manifest.json` differs from the folder name (the expected id is the directory name).	Rename the folder or change `package.id` so they are identical.
`Missing manifest.json.`	No `manifest.json` at the package root.	Add a `manifest.json` at the top of the package directory.
`manifest.json is not valid JSON.`	The file could not be parsed (trailing comma, comment, etc.).	Make it strict JSON (no comments, no trailing commas).
`manifest.json schema invalid: <messages>`	The manifest violates the schema. It is `.strict()`, so unknown keys also fail. Common triggers: a missing required `package`/`files`/`inputs`/`execution`/`outputs`, empty strings where a value is required, a `destination` not in the allowed set, or `execution.type` not equal to `"nextflow"`.	Read the appended messages — they name the failing path. Remove stray keys and supply every required field.
`Missing required file: <key> (<path>).`	One of the three required files in the `files` map — `definition`, `registry`, `samplesheet` — points at a path that does not exist.	Create the file or fix the relative path in `files`.
`Parser file not found: <path>.`	An entry in `files.parsers` does not exist on disk.	Add the parser YAML or correct the path.
`<key> script not found: <path>.`	A `files.scripts.*` path does not exist.	Add the script or fix the path.
`definition.pipeline "<x>" does not match package.id "<y>".`	The `pipeline` field in `definition.json` doesn’t equal `package.id`.	Set `definition.pipeline` to the same id as `package.id`.
`registry.id "<x>" does not match package.id "<y>".`	The `id` field in `registry.json` doesn’t equal `package.id`.	Set `registry.id` to the same id as `package.id`.
`Every definition step must have a non-empty id.`	A step in `definition.json` has a missing/blank `id`.	Give every step a non-empty string `id`.
`Duplicate definition step id: <id>.`	Two definition steps share an `id`.	Make every step id unique.
`Step "<a>" depends on missing step "<b>".`	A step’s `dependsOn` lists an id that isn’t a defined step.	Reference an existing step id, or add the missing step.
`Manifest output "<id>" references missing definition step "<step>".`	A manifest `outputs[].fromStep` doesn’t match any step in `definition.json`.	Use a step id that exists in the definition.
`Duplicate output id "<id>".`	Two manifest outputs share an `id`.	Make every output id unique.
`Output "<id>" uses Read writeback but destination is "<x>" instead of "sample_reads".`	An output declares `writeback.target: "Read"` but its `destination` isn’t `sample_reads`.	Set `destination: "sample_reads"` for Read writeback outputs.
`samplesheet.yaml must define at least one column.`	The samplesheet YAML has no `samplesheet.columns`.	Define at least one column.

Warnings (advisory — do not fail the build)

Message	Cause	Fix
`Step "<id>" has no processMatchers, so trace progress cannot map Nextflow processes to this DAG step.`	A step (in a `nextflow` package) declares no `processMatchers`, so live progress can’t attribute Nextflow processes to it.	Add a `processMatchers` array to each runnable step.
`Output "<id>" references parser "<from>" which was not found.`	An output’s `parsed.from` names a parser id that no `files.parsers` file defines.	Make `parsed.from` match a real `parser.id`, or add the parser file.
`Local execution.pipeline path does not exist: <path>. This is OK only if a custom runner handles the package.`	A local `execution.pipeline` doesn’t exist and no custom runner is declared.	Ship the workflow at that path, or set a custom runner.
`paramMap.<key> should be a Nextflow flag, plain token, or empty SeqDesk-only mapping.`	A `paramMap` value isn’t a flag/plain-token/empty mapping.	Use a `-flag`, a plain `[A-Za-z0-9_.-]` token, or `""`.
`README not found: <path>.`	`files.readme` is set but the file is missing.	Add the README or drop the field.
`samplesheet.yaml should define a "sample" or "sample_id" column for SeqDesk sample matching.`	No `sample`/`sample_id` column, so SeqDesk can’t match samples.	Add a `sample` or `sample_id` column.
`No curated outputs are configured. Runs can still use raw output folder browsing.`	The manifest `outputs` array is empty.	Add curated outputs (optional, but recommended).

My package isn’t being checked at all

If validate-pipeline-package.ts reports Checked N package(s) and your package isn’t among them, it was skipped by the directory filter: the script (and the runtime package loader, and the install-profile applier) ignore any directory whose name starts with . or _. That is why the scaffold lives at pipelines/_example — the leading underscore keeps the template from being loaded or validated as a real package. Fix: rename your folder so it doesn’t begin with . or _.

Operational pitfalls

These problems pass the descriptor linter (they aren’t manifest issues) but break real runs:

Slurm scripts must write to logs/pipeline.out and logs/pipeline.err. The progress monitor and the in-app log viewer read those two fixed paths inside the run folder — not Slurm’s own --output/--error files. The generated submit scripts already redirect there; if a custom runner redirects only to Slurm’s job log, the run shows no progress and an empty log pane. Fix: redirect (or tee) your pipeline’s stdout/stderr into logs/pipeline.out and logs/pipeline.err under the run folder.
Reference databases are not bundled in the package. Packages ship descriptors, parsers, and the workflow — not the (often very large) reference DBs. Databases are provisioned centrally: an admin manages them through the admin pipeline settings, and the install (config) profile points runs at the shared location. Fix: don’t commit DBs or hardcode a DB path in your manifest; expose the DB location as a config key/param (see Settings and parameter mapping by example) and let it resolve from the centrally configured directory.

Registration

New pipelines are automatically discovered from the pipelines/ directory. SeqDesk loads the manifest and registry files together, derives pipeline scope from the manifest, and exposes the package through the installed-pipeline list and public registry metadata.

To enable a pipeline, set its enabled flag in the PipelineConfig database table or through the admin settings.

Contributing to the Official Pipeline Store

SeqDesk hosts an official pipeline store at seqdesk.org (Admin → Pipelines → Store), where facilities browse and install curated pipelines. The store is served by the SeqDesk website — whose repository is still named seqdesk.com — from a catalog index and exported package payloads; the application only browses and installs over HTTP and does not serve the registry itself.

Requirements for a store pipeline

These requirements apply to a contributed pipeline you are proposing for the store. Built-in pipelines that ship in this repository are covered by the repository’s own LICENSE and are exercised by the automated end-to-end suites, so they do not each carry a separate license file or dummy fixture. A ## Citation section, however, is expected of every package — built-in and contributed alike.

Nextflow only. The executor currently supports Nextflow pipelines (execution.type: "nextflow"), runnable in local and Slurm modes. Other workflow engines are not yet supported.
Ships a dummy fixture. A contributed package must include a minimal test-data/ fixture (tiny, synthetic inputs, no real or sensitive data) so the pipeline can be run on dummy data. The pull-request check validates the package descriptor (npm run pipeline:validate); actual dummy-data execution is added by the maintainers when the pipeline is accepted (see Review and integration below).
Runs locally. It must pass npm run pipeline:validate and have been run locally on its fixture. Running on a Slurm cluster is not required of the contributor; the maintainers verify Slurm/AlmaLinux execution on the private mirror during integration.
Licensed. The package must declare its license and whether it is public (shipped in the release and downloadable from the store) or private/licensed (the metaxpath model: listed with status: "private" and source.kind: "github" pointing at a separate repository). Public-store pipelines should carry an open-source license.
Citable. Declare how to cite the pipeline — its authors and any paper(s) or DOI(s) — so facilities and downstream users credit the original work. Put this in the package README.md, and for public packages in the store catalog entry.

Pipeline settings and reference databases

User-tunable settings and reference databases use the same mechanism, so a contributed pipeline needs no app code to surface them:

Settings are declared in registry.json under configSchema (a JSON-Schema properties map with title/description/default/enum, plus optional x-seqdesk UI hints such as placement and helpText) and defaultConfig; visibility controls who can see and start the pipeline. SeqDesk renders these as the launch-time configuration form, and the manifest’s execution.paramMap / execution.paramRules / execution.defaultParams translate each setting into the Nextflow flags your workflow expects (for example tax2filter → --tax2filter).
Reference databases are simply settings of this kind: a config key (e.g. kraken2Db) mapped via paramMap to a Nextflow flag (--kraken2db). A facility supplies the actual path centrally — through the admin pipeline settings and the install (config) profile / settings.json — which prefills the parameter for every run and can hide the per-run field from researchers. So a database requirement both faces the user (as a configurable field) and wires into the facility’s config profile: declare the parameter in configSchema + paramMap, and document in your README.md what reference data the facility must provide.

Private and licensed packages

A private package is not shipped in the release tarball. Instead its catalog entry points the app at a separate repository, and the app fetches the package at install time using a credential the facility supplies. This is exactly how metaxpath is listed in the store catalog index:


{
  "id": "metaxpath",
  "status": "private",
  "isPrivate": true,
  "licenseRequired": true,
  "source": {
    "kind": "github",
    "label": "GitHub",
    "repository": "hzi-bifo/MetaxPath-Nextflow",
    "refDefault": "main",
    "descriptorPath": ".seqdesk/pipelines/metaxpath",
    "includeWorkflow": true,
    "keyLabel": "GitHub token"
  }
}

descriptorPath is where the manifest.json / registry.json / … live inside that repository, and includeWorkflow: true lets SeqDesk snapshot the workflow into the local installation.
keyLabel is the credential prompt the install UI shows the facility (here, a GitHub token).
Setting isPrivate or licenseRequired is enough for SeqDesk to treat the entry as private even if source.kind is omitted, but set them explicitly to be safe.
Public packages instead use source.kind: "registry", carry status: "available", and ship in the release. See Available Pipelines for the operator-facing view of an optional/private package.

Submitting a pipeline

Build the package in the standard pipelines/<id>/ format, starting from pipelines/_example/ or the seqdesk-pipeline-template repository: manifest.json, definition.json, registry.json, samplesheet.yaml, README.md, optional scripts//parsers/, plus your test-data/ fixture.
Validate it locally: npm run pipeline:validate pipelines/<id>.
Open a pull request to the SeqDesk app repository adding your pipelines/<id>/ package, using the pipeline PR template. Cover: id, name, version, authors and citation; targets (order and/or study); Nextflow profiles; inputs; settings and any reference-database parameters; outputs and writeback policy; license and public/private.
Review and integration (manual, by design). A maintainer reviews the package, runs it on its dummy fixture, and wires it into the automated integration suite on the private mirror so it keeps working on every change. On acceptance, a public package ships in the release tarball, and the maintainers regenerate the store catalog entry in the seqdesk.com website repository (npm run pipeline-registry:export <id>, added to index.json, validated by npm run registry:validate). Integration is intentionally a manual maintainer step — contributed pipelines are not auto-merged.

Frequently Asked Questions

Which workflow engines are supported?

Nextflow only. In the manifest schema, execution.type is the literal string "nextflow" — no other value validates. A package declares the workflow it runs and the profiles it supports under execution:


"execution": {
  "type": "nextflow",
  "pipeline": "./workflow",
  "version": "1.0.0",
  "profiles": ["conda"]
}

The executor runs Nextflow in both local and Slurm modes; see Running Pipelines.

Can I contribute a non-Nextflow pipeline (Snakemake, WDL, a plain script)?

Not today. Because execution.type only accepts "nextflow", a Snakemake/WDL/shell workflow cannot be packaged as-is. The supported path is to wrap your workflow in a thin Nextflow pipeline that calls it, then package that.

Do I need a Slurm cluster to contribute?

No. As a contributor you must pass npm run pipeline:validate pipelines/<id> and have run the package locally on its test-data/ fixture. Verifying Slurm / AlmaLinux execution is the maintainers’ job — they run it on the private CI mirror (hzi-bifo/SeqDesk-ci) during integration, where the self-hosted Slurm runner lives. See Testing Pipeline Integrations.

How is Study vs Sequencing Order scope decided?

From the manifest, not the UI. targets.supported is an array of "study" and/or "order", and SeqDesk derives the catalog/scope from it at discovery time; registry.json is presentation only and is not the source of truth for scope. A package can support one or both:


"targets": { "supported": ["order"] }

For the difference between the two scopes (where each starts, what it writes back), see Study vs Sequencing Order Pipelines.

How do reference databases get set — per run, or centrally?

A reference database is just a normal pipeline setting: declare it as a config key in registry.json under configSchema.properties and map it to a Nextflow flag via the manifest’s execution.paramMap (for example kraken2Db → --kraken2db). It then faces the user as a launch-time field and wires into the facility’s config profile. A facility normally supplies the path centrally through the admin pipeline settings and the install (config) profile / settings.json, which prefills the value for every run. The read-cleaning package even hides the per-run field once the server is configured, via "x-seqdesk": { "hideWhenServerConfigured": true } on its kraken2Db property. Document the required reference data in your README.md.

How do I ship a private or licensed pipeline?

Follow the MetaxPath model: the package is listed in the public store but the code lives in a separate, access-controlled GitHub repository — set status, isPrivate, licenseRequired, and a GitHub source on the catalog entry. See Private and licensed packages for the exact shape.

How is my pipeline tested after I submit?

The pull-request check validates your package descriptor with npm run pipeline:validate (the manifest schema, plus the registry, samplesheet, and definition files). It does not run your workflow. Dummy-data execution is added by the maintainers when the pipeline is accepted: they run it on its test-data/ fixture and wire it into the integration suite on the private mirror, so it then gets the same order/study push checks and the daily Slurm + install end-to-end runs the built-in packages get. See Automated integration tests.

How do I require that my pipeline be cited?

Declare the citation in the package README.md — a ## Citation section is expected of every package, built-in and contributed — listing the authors and any paper(s) or DOI(s). For a public store pipeline, also include the citation/author info in the store catalog entry. There is no separate enforcement field; the citation requirement is part of the contribution checklist and the PR template.

Where does the official pipeline store actually live?

In the website repository (still named seqdesk.com), not in the app. The catalog is the registry index at src/data/registry/index.json with per-pipeline files under src/data/registry/pipelines/, served at https://seqdesk.org (/api/registry). The application only browses and installs the store over HTTP — it does not serve the registry itself.

How do I point an installation at a different (or extra) store?

The store base URL and registry endpoints are environment-driven: SEQDESK_PIPELINE_STORE_URL (default https://seqdesk.org), SEQDESK_PIPELINE_REGISTRY_URL (defaults to <store>/api/registry), and SEQDESK_PIPELINE_REGISTRY_URLS for a comma-separated list of registries. This lets a facility run its own internal store or browse several at once.

How do I publish a new version of an existing pipeline?

Add the new version to that pipeline’s versions array in the store catalog (in the seqdesk.com repo) and update latestVersion. Public packages carry a downloadUrl for each version; GitHub-backed private packages list versions without one and resolve the code from the repository ref instead. Maintainers regenerate the catalog with npm run pipeline-registry:export <id> and validate it with npm run registry:validate, both run in the website repo.

Is integration automatic, or does a maintainer have to do it?

Manual, by design. Contributed pipelines are not auto-merged. After your PR, a maintainer reviews the package, runs it on its dummy fixture, and wires it into the automated integration suite on the private mirror. Keeping integration a deliberate maintainer step is a security choice — the self-hosted Slurm runner and private credentials never touch an untrusted PR.