Skip to Content
Core ConceptsHow SeqDesk Works

How SeqDesk Works

SeqDesk is a self-hosted platform for managing sequencing orders, running bioinformatics pipelines, and submitting results to public archives. It runs entirely on your own infrastructure — no cloud dependencies, no data leaving your network.

The Big Picture

SeqDesk connects three stages of a typical sequencing facility workflow:

  1. Order & Sample Management — Researchers submit sequencing requests, facility staff track and process them.
  2. Analysis — Automated pipelines (powered by Nextflow) assemble genomes, bin metagenomes, and produce quality reports.
  3. Submission — Results and metadata are packaged and submitted to the European Nucleotide Archive (ENA).

Data Flow

Researcher Facility Admin │ │ ├─ Create Order ──────────────▶│ ├─ Add Samples │ ├─ Submit ────────────────────▶│ │ ├─ Assign FASTQ files to samples │ ├─ Create Study (group samples) │ ├─ Launch Pipeline (MAG, SubMG, …) │ ├─ Review results (assemblies, bins) │ └─ Submit to ENA │ │ ◀── View results ──────────────┘

Key Entities

EntityPurpose
OrderA sequencing request containing one or more samples. Tracks status from draft through completion.
SampleAn individual biological sample with metadata (organism, taxonomy, MIxS fields).
ReadA FASTQ file (or pair of files) linked to a sample. Includes checksums for integrity.
StudyA logical grouping of samples for analysis and/or submission.
Pipeline RunAn execution of a Nextflow workflow against a set of samples.
AssemblyContigs produced by an assembly pipeline (e.g. MAG).
BinA genome bin extracted from a metagenomic assembly, with completeness and contamination scores.

Architecture

SeqDesk is built as a single Next.js application that bundles the web UI, API layer, and pipeline orchestration into one process.

LayerTechnologyRole
FrontendReact + Tailwind CSSInteractive UI with real-time pipeline monitoring
APINext.js API RoutesREST endpoints for all operations
DatabasePostgreSQLPersistent storage via Prisma ORM
AuthNextAuth.jsSession-based authentication with role-based access
PipelinesNextflowWorkflow execution — local or SLURM cluster

Pipeline System

Pipelines are defined as self-contained packages with a manifest-driven architecture. Each pipeline package includes:

  • manifest.json — declares inputs, outputs, parameters, and execution commands
  • definition.json — describes the workflow DAG for visualization
  • samplesheet.yaml — declarative rules for generating input samplesheets
  • parsers/ — YAML definitions for extracting structured results from output files

When a pipeline runs, SeqDesk:

  1. Generates the samplesheet from the study’s samples and reads
  2. Builds the Nextflow command with configured parameters
  3. Executes via local process or SLURM submission
  4. Monitors progress through trace files and weblog events
  5. Discovers output files and parses results into the database

Run-status lifecycle

A pipeline run moves through these statuses:

pendingqueuedrunningcompleted / failed / cancelled

  • pending — the run row exists but execution hasn’t started.
  • queued — submitted to SLURM and waiting for resources (local runs move straight to running).
  • running — Nextflow is executing processes.
  • completed / failed — terminal outcomes after all processes finish or one fails.
  • cancelled — terminal; the run was stopped manually.

Live progress relies on Nextflow’s weblog callbacks, which post events back to SeqDesk. A weblog secret must be configured for those callbacks to be accepted — without it the endpoint rejects events and SeqDesk falls back to reading trace files. See the Pipelines monitoring docs for setup.

Configuration Resolution

SeqDesk merges configuration from multiple sources in this priority order:

  1. Environment variables — highest priority, ideal for deployment automation
  2. Config fileseqdesk.config.json for structured, version-controlled settings
  3. Database — runtime settings changed through the admin UI
  4. Defaults — built-in fallback values

This layered approach lets you override specific values at deployment time while keeping the rest configurable through the UI.

Getting Data In

There are two ways FASTQ data reaches a sample:

  1. File scan (default). After sequencing finishes, the admin browses the configured data directory, and SeqDesk auto-detects FASTQ files and pairs R1/R2 reads so they can be assigned to samples. This is the path described in the data-flow diagram above.
  2. Live MinKNOW stream ingest (Oxford Nanopore). For ONT runs, an admin can attach a running MinKNOW sequencing run to an order and have reads ingested as MinKNOW writes them, rather than waiting for the run to finish. A long-lived stream-monitor daemon (run outside the web app) watches MinKNOW’s output root and links new FASTQ files to samples by barcode. Admins configure and enable it at Admin → MinKNOW Stream (/admin/minknow-stream); per-order stream control lives under the order’s Sequencing Data → Stream tab, backed by the /api/orders/[id]/stream APIs (start, events, by-barcode, stop). If the stream-monitor daemon isn’t running, nothing is ingested even when the config is saved. The underlying StreamRun / StreamIngestedFile / StreamRunEvent records track each live session.

Two Roles

SeqDesk uses a simple role model:

Researcher — creates orders, adds samples, views results. Can only see their own data (unless department sharing is enabled).

Facility Admin — full access to all orders, studies, and settings. Can configure the system, run pipelines, and submit to ENA.

Self-Hosted by Design

SeqDesk is designed to run on your own infrastructure:

  • No external dependencies — everything runs locally (database, file storage, pipeline execution)
  • Your data stays on your network — sequencing files are referenced by path, never uploaded to a remote service
  • Single-command installnpm i -g seqdesk && seqdesk gets you running
  • Automatic updates — built-in update system with backup and rollback support