How SeqDesk Works

SeqDesk is a self-hosted platform for managing sequencing orders, running bioinformatics pipelines, and submitting results to public archives. It runs entirely on your own infrastructure — no cloud dependencies, no data leaving your network.

The Big Picture

SeqDesk connects three stages of a typical sequencing facility workflow:

Sequencing Order & Sample Management — Researchers submit sequencing requests, facility staff track and process them.
Analysis — Automated pipelines (powered by Nextflow) assemble genomes, bin metagenomes, and produce quality reports.
Submission — Results and metadata are packaged and submitted to the European Nucleotide Archive (ENA).

Data Flow


Researcher                    Facility Admin
    │                              │
    ├─ Create Order ──────────────▶│
    ├─ Add Samples                 │
    ├─ Submit ────────────────────▶│
    │                              ├─ Assign FASTQ files to samples
    │                              ├─ Create Study (group samples)
    │                              ├─ Launch Pipeline (MAG, SubMG, …)
    │                              ├─ Review results (assemblies, bins)
    │                              └─ Submit to ENA
    │                              │
    ◀── View results ──────────────┘

Key Entities

Entity	Purpose
Sequencing Order	A sequencing request containing one or more samples (the `Order` model in the schema). Tracks status from draft through completion.
Sample	An individual biological sample with metadata (organism, taxonomy, MIxS fields).
Read	A FASTQ file (or pair of files) linked to a sample. Includes checksums for integrity.
Study	A logical grouping of samples for analysis and/or submission.
Pipeline Run	An execution of a Nextflow workflow against a set of samples.
Assembly	Contigs produced by an assembly pipeline (e.g. MAG).
Bin	A genome bin extracted from a metagenomic assembly, with completeness and contamination scores.

Architecture

SeqDesk is built as a single Next.js application that bundles the web UI, API layer, and pipeline orchestration into one process.

Layer	Technology	Role
Frontend	React + Tailwind CSS	Interactive UI with real-time pipeline monitoring
API	Next.js API Routes	REST endpoints for all operations
Database	PostgreSQL 14+	Persistent storage via Prisma ORM (SQLite is not supported)
Auth	NextAuth.js	Session-based authentication with role-based access
Pipelines	Nextflow	Workflow execution — local or SLURM cluster

You normally do not supply the database yourself: the installer reuses a local PostgreSQL server it can administer, or creates and owns a private, socket-only cluster under $HOME/.seqdesk/postgres (SEQDESK_PG_HOME) that the install directory’s start.sh starts before the app.

Pipeline System

Pipelines are defined as self-contained packages with a manifest-driven architecture. Each pipeline package includes:

manifest.json — declares inputs, outputs, parameters, and execution commands
definition.json — describes the workflow DAG for visualization
samplesheet.yaml — declarative rules for generating input samplesheets
parsers/ — YAML definitions for extracting structured results from output files

When a pipeline runs, SeqDesk:

Generates the samplesheet from the study’s samples and reads
Builds the Nextflow command with configured parameters
Executes via local process or SLURM submission
Monitors progress through trace files and weblog events
Discovers output files and parses results into the database

Run-status lifecycle

A pipeline run moves through these statuses:

pending → queued → running → completed / failed / cancelled

pending — the run row exists but execution hasn’t started.
queued — submitted to SLURM and waiting for resources (local runs move straight to running).
running — Nextflow is executing processes.
completed / failed — terminal outcomes after all processes finish or one fails.
cancelled — terminal; the run was stopped manually.

Live progress relies on Nextflow’s weblog callbacks, which post events back to SeqDesk. A weblog secret must be configured for those callbacks to be accepted — without it the endpoint rejects events and SeqDesk falls back to reading trace files. See the Pipelines monitoring docs for setup.

Configuration Resolution

SeqDesk merges configuration from multiple sources in this priority order:

Environment variables — highest priority, ideal for deployment automation
Config file — settings.json in the install root, written chmod 600 and symlinked into the active release so there is exactly one config file. Installs created before the settings consolidation keep the legacy seqdesk.config.json name.
Database — runtime settings changed through the admin UI
Defaults — built-in fallback values

This layered approach lets you override specific values at deployment time while keeping the rest configurable through the UI.

Getting Data In

There are two ways FASTQ data reaches a sample:

File scan (default). After sequencing finishes, the admin browses the configured data directory, and SeqDesk auto-detects FASTQ files and pairs R1/R2 reads so they can be assigned to samples. This is the path described in the data-flow diagram above.
Live MinKNOW stream ingest (Oxford Nanopore). For ONT runs, an admin can attach a running MinKNOW sequencing run to a sequencing order and have reads ingested as MinKNOW writes them, rather than waiting for the run to finish. A long-lived stream-monitor daemon (run outside the web app) watches MinKNOW’s output root and links new FASTQ files to samples by barcode. Admins configure and enable it at Admin → MinKNOW Stream (/admin/minknow-stream); per-order stream control lives under the sequencing order’s Sequencing Data → Stream tab, backed by the /api/orders/[id]/stream APIs (start, events, by-barcode, stop). If the stream-monitor daemon isn’t running, nothing is ingested even when the config is saved. The underlying StreamRun / StreamIngestedFile / StreamRunEvent records track each live session.

Two Roles

SeqDesk uses a simple role model:

Researcher — creates sequencing orders, adds samples, views results. Can only see their own data (unless department sharing is enabled).

Facility Admin — full access to all sequencing orders, studies, and settings. Can configure the system, run pipelines, and submit to ENA.

Self-Hosted by Design

SeqDesk is designed to run on your own infrastructure:

No external dependencies — everything runs locally (database, file storage, pipeline execution)
Your data stays on your network — sequencing files are referenced by path, never uploaded to a remote service
Guided launcher install — npm i -g seqdesk@latest && seqdesk --interactive gets you running
Automatic updates — built-in update system with backup and rollback support