Skip to Content
ENA SubmissionSubmitting to ENA

Submitting to ENA

SeqDesk automates the submission process to the European Nucleotide Archive. This page covers the step-by-step submission workflow.

Submission Process

Prepare the study

Ensure all validation requirements are met — a non-empty title and description, at least one sample, and a tax ID on every sample.

Submit the study

The facility admin initiates submission from the study page or the ENA Submissions page (/submissions). The system:

  1. Generates Study (PROJECT) XML with the alias, title, and description
  2. Generates Submission XML (the ADD action wrapper for ENA’s API)
  3. Sends a multipart POST to ENA’s submit endpoint
  4. Parses the receipt XML for the project accession

On success, the study receives a studyAccessionId (format: PRJEB...).

Submit samples

In the same request, after the study is registered, the samples are submitted together as a single SAMPLE_SET:

  1. Sample XML is generated with each sample’s alias, title, TAXON_ID, optional scientific name, and MIxS metadata as sample attributes
  2. The set is POSTed to ENA
  3. sampleAccessionNumber (format: ERS...) is stored on each sample from the receipt

The XML registration path stores only sampleAccessionNumber. The BioSample accession (SAMEA..., the biosampleNumber field) is written only by the submg pipeline — see Submitting reads and assemblies with submg below.

Track accession numbers

The study accession and sample accessions are stored on the corresponding records and surfaced in the study detail page and the Submissions Dashboard.

XML Generation

SeqDesk generates ENA-compliant XML for each entity type:

Study XML

<PROJECT_SET> <PROJECT alias="my-study-alias"> <TITLE>Study Title</TITLE> <DESCRIPTION>Study description...</DESCRIPTION> <SUBMISSION_PROJECT> <SEQUENCING_PROJECT/> </SUBMISSION_PROJECT> </PROJECT> </PROJECT_SET>

Sample XML

<SAMPLE_SET> <SAMPLE alias="HG001"> <TITLE>Human Gut Sample 1</TITLE> <SAMPLE_NAME> <TAXON_ID>408170</TAXON_ID> <SCIENTIFIC_NAME>human gut metagenome</SCIENTIFIC_NAME> </SAMPLE_NAME> <SAMPLE_ATTRIBUTES> <SAMPLE_ATTRIBUTE> <TAG>geographic location (country and/or sea)</TAG> <VALUE>Germany</VALUE> </SAMPLE_ATTRIBUTE> <!-- Additional MIxS fields --> </SAMPLE_ATTRIBUTES> </SAMPLE> </SAMPLE_SET>

ENA API Endpoints

EnvironmentURL
Testhttps://wwwdev.ebi.ac.uk/ena/submit/drop-box/submit/
Productionhttps://www.ebi.ac.uk/ena/submit/drop-box/submit/

Requests use HTTP Basic authentication with your Webin credentials and multipart form data containing the XML files.

Connection test

The Test Connection button in ENA settings does not register anything: it sends a minimal multipart POST to the drop-box endpoint with a VALIDATE action and a throwaway project. A success="true"/success="false" receipt means the credentials authenticated; a 401 means they did not. The username must match Webin-12345 (the ^Webin-\d+$ pattern) before the request is even sent.

Broker-account mode

If the install runs as an ENA broker (configured in ENA settings), the registration path adds a center_name attribute — taken from the configured center name — to the PROJECT and SAMPLE XML it generates, so records are attributed to the brokering center. Broker mode applies to the XML registration path; the submg pipeline below does not inject a center name.

Submission Tracking

Each study registration creates one Submission record (the XML path creates study submission records; it does not create separate per-sample records):

FieldDescription
submissionTypeSTUDY for this path
statusPENDINGACCEPTED / PARTIAL / ERROR (an operator can set CANCELLED)
xmlContentThe combined study + sample + submission XML, kept for debugging
responseJSON wrapper: server, test flag, message, the steps[] timeline, and a nested receipt (with the raw receipt XML) and any errors
accessionNumbersJSON map, e.g. { "study": "PRJEB12345", "<sampleId>": "ERS..." }

There is no separate receiptXml, requestPayload, or error-message column — the receipt and error text live inside the response JSON. See the Submissions Dashboard for the full record shape and the status lifecycle.

Resubmitting

The XML path does not batch or retry samples individually. If a registration fails or partially succeeds, fix the underlying records and re-run it from the Submissions Dashboard — “Retry” simply re-POSTs /api/admin/submissions for the same study, regenerating fresh XML. When a study already has an accession, the re-run skips re-registering the project and submits only the samples that still lack an accession.

Submitting reads and assemblies with submg

The XML path above registers the study and samples (metadata) only. To submit the actual sequencing data — reads, the assembly, and optional bins — SeqDesk runs the submg pipeline (“Submit to ENA”), which wraps the submg  CLI. It runs as a normal pipeline run: it is visible to users but admin-triggered (userCanStart is false), can be scoped to a whole study or a subset of samples, and writes accessions back to the database when it finishes.

Prerequisites

Per sample, submg requires:

  • Paired-end reads — each read must have both an R1 and an R2 FASTQ; submg rejects samples with no paired-end read files.
  • MD5 checksums on both R1 and R2.
  • An assembly FASTA that exists on disk (run the MAG pipeline first if a sample has no assembly).
  • A study ENA accession (PRJ...) — the study must already be registered via the XML path.
  • Checklist metadatacollection date and geographic location (country and/or sea) are required; other checklist fields are passed through.

When the ENA target is the Test server, the study’s test registration must also be less than 24 hours old, because ENA expires test registrations.

An assembly coverage value is resolved from sample or order custom fields (coverage_depth, coverage_value, target_coverage, and similar keys). If none is found it defaults to 1 and the run emits a warning so you know the coverage was assumed rather than measured.

What it submits

submg builds a YAML config per sample and calls submg submit with --submit_samples --submit_reads --submit_assembly (and --submit_bins when bins are present and enabled). When it completes, SeqDesk parses the submission logs and writes back:

  • sampleAccessionNumber (ERS...) and biosampleNumber (SAMEA...) on each sample
  • run and experiment accessions on each read
  • the assembly accession (ERZ...)
  • bin accessions when bins were submitted

This BioSample (SAMEA...) write-back is unique to the submg path.

Test vs Production

FeatureTest ModeProduction
Serverwwwdev.ebi.ac.ukwww.ebi.ac.uk 
Data persistence24 hoursPermanent
Accession numbersValid format but temporaryPermanent public accessions
PurposeValidation and testingReal submissions

Always validate with test mode before submitting to production. Test submissions expire after 24 hours and do not create permanent records.