How SeqDesk Works
SeqDesk is a self-hosted platform for managing sequencing orders, running bioinformatics pipelines, and submitting results to public archives. It runs entirely on your own infrastructure — no cloud dependencies, no data leaving your network.
The Big Picture
SeqDesk connects three stages of a typical sequencing facility workflow:
- Order & Sample Management — Researchers submit sequencing requests, facility staff track and process them.
- Analysis — Automated pipelines (powered by Nextflow) assemble genomes, bin metagenomes, and produce quality reports.
- Submission — Results and metadata are packaged and submitted to the European Nucleotide Archive (ENA).
Data Flow
Researcher Facility Admin
│ │
├─ Create Order ──────────────▶│
├─ Add Samples │
├─ Submit ────────────────────▶│
│ ├─ Assign FASTQ files to samples
│ ├─ Create Study (group samples)
│ ├─ Launch Pipeline (MAG, SubMG, …)
│ ├─ Review results (assemblies, bins)
│ └─ Submit to ENA
│ │
◀── View results ──────────────┘Key Entities
| Entity | Purpose |
|---|---|
| Order | A sequencing request containing one or more samples. Tracks status from draft through completion. |
| Sample | An individual biological sample with metadata (organism, taxonomy, MIxS fields). |
| Read | A FASTQ file (or pair of files) linked to a sample. Includes checksums for integrity. |
| Study | A logical grouping of samples for analysis and/or submission. |
| Pipeline Run | An execution of a Nextflow workflow against a set of samples. |
| Assembly | Contigs produced by an assembly pipeline (e.g. MAG). |
| Bin | A genome bin extracted from a metagenomic assembly, with completeness and contamination scores. |
Architecture
SeqDesk is built as a single Next.js application that bundles the web UI, API layer, and pipeline orchestration into one process.
| Layer | Technology | Role |
|---|---|---|
| Frontend | React + Tailwind CSS | Interactive UI with real-time pipeline monitoring |
| API | Next.js API Routes | REST endpoints for all operations |
| Database | PostgreSQL | Persistent storage via Prisma ORM |
| Auth | NextAuth.js | Session-based authentication with role-based access |
| Pipelines | Nextflow | Workflow execution — local or SLURM cluster |
Pipeline System
Pipelines are defined as self-contained packages with a manifest-driven architecture. Each pipeline package includes:
- manifest.json — declares inputs, outputs, parameters, and execution commands
- definition.json — describes the workflow DAG for visualization
- samplesheet.yaml — declarative rules for generating input samplesheets
- parsers/ — YAML definitions for extracting structured results from output files
When a pipeline runs, SeqDesk:
- Generates the samplesheet from the study’s samples and reads
- Builds the Nextflow command with configured parameters
- Executes via local process or SLURM submission
- Monitors progress through trace files and weblog events
- Discovers output files and parses results into the database
Run-status lifecycle
A pipeline run moves through these statuses:
pending → queued → running → completed / failed / cancelled
- pending — the run row exists but execution hasn’t started.
- queued — submitted to SLURM and waiting for resources (local runs move
straight to
running). - running — Nextflow is executing processes.
- completed / failed — terminal outcomes after all processes finish or one fails.
- cancelled — terminal; the run was stopped manually.
Live progress relies on Nextflow’s weblog callbacks, which post events back to SeqDesk. A weblog secret must be configured for those callbacks to be accepted — without it the endpoint rejects events and SeqDesk falls back to reading trace files. See the Pipelines monitoring docs for setup.
Configuration Resolution
SeqDesk merges configuration from multiple sources in this priority order:
- Environment variables — highest priority, ideal for deployment automation
- Config file —
seqdesk.config.jsonfor structured, version-controlled settings - Database — runtime settings changed through the admin UI
- Defaults — built-in fallback values
This layered approach lets you override specific values at deployment time while keeping the rest configurable through the UI.
Getting Data In
There are two ways FASTQ data reaches a sample:
- File scan (default). After sequencing finishes, the admin browses the configured data directory, and SeqDesk auto-detects FASTQ files and pairs R1/R2 reads so they can be assigned to samples. This is the path described in the data-flow diagram above.
- Live MinKNOW stream ingest (Oxford Nanopore). For ONT runs, an admin can
attach a running MinKNOW sequencing run to an order and have reads ingested
as MinKNOW writes them, rather than waiting for the run to finish. A
long-lived stream-monitor daemon (run outside the web app) watches
MinKNOW’s output root and links new FASTQ files to samples by barcode.
Admins configure and enable it at Admin → MinKNOW Stream
(
/admin/minknow-stream); per-order stream control lives under the order’s Sequencing Data → Stream tab, backed by the/api/orders/[id]/streamAPIs (start, events, by-barcode, stop). If the stream-monitor daemon isn’t running, nothing is ingested even when the config is saved. The underlyingStreamRun/StreamIngestedFile/StreamRunEventrecords track each live session.
Two Roles
SeqDesk uses a simple role model:
Researcher — creates orders, adds samples, views results. Can only see their own data (unless department sharing is enabled).
Facility Admin — full access to all orders, studies, and settings. Can configure the system, run pipelines, and submit to ENA.
Self-Hosted by Design
SeqDesk is designed to run on your own infrastructure:
- No external dependencies — everything runs locally (database, file storage, pipeline execution)
- Your data stays on your network — sequencing files are referenced by path, never uploaded to a remote service
- Single-command install —
npm i -g seqdesk && seqdeskgets you running - Automatic updates — built-in update system with backup and rollback support