Skip to content

Submission Config Guide

Table of Contents

Introduction

A submission configuration file (.config) is needed in order to successfully submit samples to NCBI or GISAID databases through TOSTADAS.

This file contains all necessary information about the user's credentials and properties for the submission itself to each of the NCBI databases.

Before beginning to populate the submission configuration file, please ensure that you have NCBI access (a working username/password for authentication). For more information on how to gain access to NCBI and/or GISAID, it can be found here.

General Format & Content

The fields and corresponding example values can be found here: Default Config.

Specifying NCBI Databases

The submission configuration file contains multiple fields for specifying which NCBI databases to submit samples to.

These fields are ONLY used when the submission_database (nextflow parameter) is set to 'submit'. Otherwise, the databases specified for the NF parameter ('sra', 'gisaid', 'biosample', 'joint_sra_biosample') will take precedence over the contents of the submission config. This allows easy specification for a single database through the NF parameter set and a more complex combination of databases to submit to, based on your specific needs.

The following are the specific fields that toggle between databases for submission:

submit_Genbank: True
submit_GISAID: True
submit_SRA: False
submit_BioSample: True
joint_SRA_BioSample_submission: True
TOSTADAS will submit to each database that is set to True and ignore all others.

Information For Each Field

Personal Fields

Field Name Description Input Required
contact_email1 General email for potential contact Yes (string)
contact_email2 A second general email for potential contact Yes (string)
submitter_info Contains subfields for the submitter's personal information Yes (string)
NCBI / username Your personal username credential for NCBI Yes (string)
NCBI / password Your personal password credential for NCBI Yes (string)
organization_name Name of the organization or company you are affiliated with Yes (string)
ncbi / citation_address Contains subfields for information about the location for sample generation, for citation purposes Yes (string)
ncbi / publication_title Name of the publication or project Yes (string)
ncbi / BioProject Collection of biological data related to a single initiative, originating from a single organization or from a consortium Yes (string)
ncbi / Center_title Name of your affiliated center / group Yes (string)
gisaid / username Your personal username credential for GISAID Yes (string)
gisaid / password Your personal password credential for GISAID Yes (string)

General Fields

Field Name Description Input Required
submit_Genbank Toggle submission of samples to Genbank Yes (bool)
submit_GISAID Toggle submission of samples to GISAID Yes (bool)
submit_SRA Toggle submission of samples to SRA Yes (bool)
submit_BioSample Toggle submission of samples to BioSample Yes (bool)
joint_SRA_BioSample_submission Toggle submission of samples to joint SRA + BioSample Yes (bool)
genbank_submission_type Method or type for submitting Genbank (i.e. table2asn) Yes (string)
authorset Field or name designating the authors per sample Yes (string)
organism_name Name of the organism for submitted samples Yes (string)
metadata_file_sep The separation for your metadata file (i.e. \t) Yes (string)
fasta_sample_name_col Name of column containing name of samples Yes (string)
collection_date_col Name of column containing the collection dates for the samples Yes (string)
baseline_surveillance Whether baseline surveillance occurred or not Yes (bool)

General NCBI Fields

Field Name Description Input Required
ncbi_org_id Organization ID for NCBI Yes (string)
ncbi / hostname The FTP host name for NCBI Yes (string)
ncbi / api_url URL for the NCBI API Yes (string)
ncbi_ftp_path_to_submission_folders Path to the submission folders at endpoint Yes (string)

NCBI Databases Fields (General)

Field Name Description Input Required
ncbi / SRA_file_location Location of SRA file Yes (string)
ncbi / SRA_file_column(1-3) Name of column containing SRA file information Yes (string)
ncbi / Genbank_organization_type Type of organization for Genbank Yes (string)
ncbi / Genbank_organization_role Role for Genbank Yes (string)
ncbi / Genbank_spuid_namespace The namespace for Genbank Yes (string)
ncbi / Genbank_auto_remove_sequences_that_fail_qc Whether or not to remove sequences that fail QC for Genbank Yes (bool)

NCBI Databases Fields (Specific)

Field Name Description Input Required
gisaid / column_names The column names from metadata sheet that correspond to various GISAID database variables Yes (string)
SRA_attributes / column_names The column names from metadata sheet that correspond to various SRA database variables Yes (string)
BioSample_attributes / column_names The column names from metadata sheet that correspond to various BioSample database variables Yes (string)
genbank_cmt_metadata / column_names The column names from metadata sheet that correspond to various Genbank database variables Yes (string)
genbank_src_metadata / column_names The column names from metadata sheet that correspond to various Genbank database variables Yes (string)

Last update: 2025-03-26