Running a Pipeline

SeqDesk supports two launch contexts:

Study pipelines run across the selected samples of a study.
Sequencing Order pipelines run on the linked sequencing files of samples in a sequencing order.

In both cases, SeqDesk prepares the package inputs automatically and executes the workflow either locally or on a SLURM cluster.

Common Prerequisites

Before running a pipeline:

Pipelines must be enabled in admin settings (pipelines.enabled: true)
The execution environment must be configured (local or SLURM)
You need the FACILITY_ADMIN role

Some packages require linked reads. For example, MAG, FASTQ Checksum, FastQC, and Read Cleaning require FASTQ files to already be linked. Simulate Reads is the exception because it generates read files instead of consuming existing ones.

Launching a Study Pipeline

Study pipelines are the right choice for workflows that combine multiple samples into larger analyses, reports, or submission jobs.

Open the study

Navigate to the study that contains your samples. Go to the Pipelines tab.

Select a pipeline

Choose from the available pipelines (e.g., MAG). Each pipeline shows its description and requirements.

Configure parameters

Adjust pipeline-specific settings:

MAG Pipeline options:

Parameter	Default	Description
Stub Mode	false	Test mode — runs fast without actual analysis
Skip MEGAHIT	false	Skip the MEGAHIT assembler
Skip SPAdes	true	Skip the SPAdes assembler
Skip Prokka	true	Skip gene annotation
Skip CONCOCT	true	Skip CONCOCT binning
Skip Bin QC	false	Skip bin quality control
Skip GTDB-Tk	false	Skip taxonomy classification

Select samples

Choose which samples from the study to include. All selected samples must have reads assigned.

Launch

Confirm and start the run. SeqDesk:

Generates a samplesheet CSV from your samples and reads
Creates a run directory (e.g., MAG-20240126-001/)
Builds the Nextflow execution command
Starts the pipeline (locally or via SLURM)

Launching a Sequencing Order Pipeline

Sequencing Order pipelines are the right choice for sample-level sequencing utilities such as read simulation, checksum validation, and read QC.

Open the sequencing order

Navigate to the sequencing order you want to work on. Use the sequencing or pipeline area for that sequencing order, depending on the package and your current workflow.

Review linked sequencing files

Check whether the samples already have linked FASTQ files. This is required for packages such as FASTQ Checksum and FastQC. If the sequencing order has no reads yet, you can start with Simulate Reads.

Select a sequencing order pipeline

Choose the package you want to run for that sequencing order. The current built-in sequencing order catalog includes Simulate Reads, FASTQ Checksum, FastQC, and Read Cleaning.

Configure parameters

Sequencing Order pipelines typically have narrower configuration than study pipelines. Examples:

Pipeline	Example parameters
Simulate Reads	Mode, read count, read length, replace existing files
FASTQ Checksum	Usually no additional configuration
FastQC	Usually no additional configuration

Launch

Confirm and start the run. SeqDesk:

Generates the package inputs from sequencing order samples and linked reads
Creates a run directory for the package
Builds the Nextflow execution command
Starts the pipeline and tracks the run
Resolves artifacts and validated Read writeback after completion

Run Number Format

Each run gets a unique number: {PIPELINE}-{YYYYMMDD}-{NNN} (e.g., MAG-20240126-001).

Run Lifecycle

A run moves through these statuses:


pending → queued → running → completed
                          └─→ failed
                          └─→ cancelled

Status	Meaning
`pending`	Run record created, not yet launched
`queued`	Submitted to SLURM, waiting for resources (SLURM mode)
`running`	The workflow is executing
`completed`	Finished successfully and outputs resolved
`failed`	Pre-launch validation, prep, or execution failed
`cancelled`	Manually cancelled

Key behaviors:

Pre-launch validation. Before a run is launched, SeqDesk validates pipeline metadata and the derived launch config. If validation or input preparation fails, the run is moved to failed and never launched.
Single launch. Launching atomically claims a pending run. A second start request for an already-claimed run is rejected with 409 Conflict, so a run can never be double-started.
Local execution. Local runs are started as a detached bash run.sh process and tracked via queueJobId = local-{pid}.

Input Generation

SeqDesk auto-generates the package inputs that Nextflow expects. The exact file shape depends on the package, but the source data always comes from canonical SeqDesk records.

Scope	Typical generated input
Study pipeline	A study-level samplesheet built from selected samples and their reads
Sequencing Order pipeline	A samplesheet or manifest generated from sequencing order samples and linked reads

For the MAG pipeline, each row contains:

Column	Source
sample	Sample ID (`sample.sampleId`)
group	Study ID (`study.id`) — used for co-assembly grouping
short_reads_1	Path to R1 FASTQ file
short_reads_2	Path to R2 FASTQ file
short_reads_platform	Required; derived from the sequencing-technology selector (mapped to `ILLUMINA`/`DNBSEQ`/`OXFORD`/`PACBIO`…)

The generated input is saved in the run directory as samplesheet.csv (or a package-specific manifest file).

Execution Modes

Local

Nextflow runs directly on the SeqDesk server. Suitable for testing and small datasets.

SLURM

Nextflow submits jobs to a SLURM cluster. Configure in admin settings:

Setting	Default	Description
Queue	`cpu`	SLURM partition name
Cores	4	CPUs per job
Memory	`64GB`	Memory per job
Time Limit	12h	Maximum run time
Additional Options	—	Extra SLURM flags

The SLURM job ID is tracked in the queueJobId field for status monitoring. For local runs, queueJobId is set to local-{pid} (the detached process ID).

Run Directory Structure

Each run creates a directory under the configured pipelineRunDir. The exact outputs differ by package, but the common execution files are:


{PIPELINE}-{YYYYMMDD}-{NNN}/
├── run.sh              # Generated launch script (run with `bash run.sh`)
├── samplesheet.csv     # Or another generated package input
├── nextflow.config     # Generated Nextflow config (when one is produced)
├── output/             # Pipeline outputs directory (Nextflow --outdir)
├── logs/
│   ├── pipeline.out    # stdout
│   └── pipeline.err    # stderr
├── trace.txt           # Process trace (TSV)
├── dag.dot             # Workflow DAG (Graphviz)
├── report.html         # Nextflow execution report
├── timeline.html       # Nextflow timeline report
└── ...                 # Package-specific outputs and artifacts

In SLURM mode the scheduler also writes its own logs/slurm-%j.out / logs/slurm-%j.err files; the pipeline monitor reads logs/pipeline.out and logs/pipeline.err.