Submission Config Guide¶
Table of Contents¶
Introduction¶
A submission configuration file (.config) is needed in order to successfully submit samples to NCBI or GISAID databases through TOSTADAS.
This file contains all necessary information about the user's credentials and properties for the submission itself to each of the NCBI databases.
Before beginning to populate the submission configuration file, please ensure that you have NCBI access (a working username/password for authentication). For more information on how to gain access to NCBI and/or GISAID, it can be found here.
General Format & Content¶
The fields and corresponding example values can be found here: Default Config.
Specifying NCBI Databases¶
The submission configuration file contains multiple fields for specifying which NCBI databases to submit samples to.
These fields are ONLY used when the submission_database (nextflow parameter) is set to 'submit'. Otherwise, the databases specified for the NF parameter ('sra', 'gisaid', 'biosample', 'joint_sra_biosample') will take precedence over the contents of the submission config. This allows easy specification for a single database through the NF parameter set and a more complex combination of databases to submit to, based on your specific needs.
The following are the specific fields that toggle between databases for submission:
submit_Genbank: True
submit_GISAID: True
submit_SRA: False
submit_BioSample: True
joint_SRA_BioSample_submission: True
Information For Each Field¶
Personal Fields¶
Field Name | Description | Input Required |
---|---|---|
contact_email1 | General email for potential contact | Yes (string) |
contact_email2 | A second general email for potential contact | Yes (string) |
submitter_info | Contains subfields for the submitter's personal information | Yes (string) |
NCBI / username | Your personal username credential for NCBI | Yes (string) |
NCBI / password | Your personal password credential for NCBI | Yes (string) |
organization_name | Name of the organization or company you are affiliated with | Yes (string) |
ncbi / citation_address | Contains subfields for information about the location for sample generation, for citation purposes | Yes (string) |
ncbi / publication_title | Name of the publication or project | Yes (string) |
ncbi / BioProject | Collection of biological data related to a single initiative, originating from a single organization or from a consortium | Yes (string) |
ncbi / Center_title | Name of your affiliated center / group | Yes (string) |
gisaid / username | Your personal username credential for GISAID | Yes (string) |
gisaid / password | Your personal password credential for GISAID | Yes (string) |
General Fields¶
Field Name | Description | Input Required |
---|---|---|
submit_Genbank | Toggle submission of samples to Genbank | Yes (bool) |
submit_GISAID | Toggle submission of samples to GISAID | Yes (bool) |
submit_SRA | Toggle submission of samples to SRA | Yes (bool) |
submit_BioSample | Toggle submission of samples to BioSample | Yes (bool) |
joint_SRA_BioSample_submission | Toggle submission of samples to joint SRA + BioSample | Yes (bool) |
genbank_submission_type | Method or type for submitting Genbank (i.e. table2asn) | Yes (string) |
authorset | Field or name designating the authors per sample | Yes (string) |
organism_name | Name of the organism for submitted samples | Yes (string) |
metadata_file_sep | The separation for your metadata file (i.e. \t) | Yes (string) |
fasta_sample_name_col | Name of column containing name of samples | Yes (string) |
collection_date_col | Name of column containing the collection dates for the samples | Yes (string) |
baseline_surveillance | Whether baseline surveillance occurred or not | Yes (bool) |
General NCBI Fields¶
Field Name | Description | Input Required |
---|---|---|
ncbi_org_id | Organization ID for NCBI | Yes (string) |
ncbi / hostname | The FTP host name for NCBI | Yes (string) |
ncbi / api_url | URL for the NCBI API | Yes (string) |
ncbi_ftp_path_to_submission_folders | Path to the submission folders at endpoint | Yes (string) |
NCBI Databases Fields (General)¶
Field Name | Description | Input Required |
---|---|---|
ncbi / SRA_file_location | Location of SRA file | Yes (string) |
ncbi / SRA_file_column(1-3) | Name of column containing SRA file information | Yes (string) |
ncbi / Genbank_organization_type | Type of organization for Genbank | Yes (string) |
ncbi / Genbank_organization_role | Role for Genbank | Yes (string) |
ncbi / Genbank_spuid_namespace | The namespace for Genbank | Yes (string) |
ncbi / Genbank_auto_remove_sequences_that_fail_qc | Whether or not to remove sequences that fail QC for Genbank | Yes (bool) |
NCBI Databases Fields (Specific)¶
Field Name | Description | Input Required |
---|---|---|
gisaid / column_names | The column names from metadata sheet that correspond to various GISAID database variables | Yes (string) |
SRA_attributes / column_names | The column names from metadata sheet that correspond to various SRA database variables | Yes (string) |
BioSample_attributes / column_names | The column names from metadata sheet that correspond to various BioSample database variables | Yes (string) |
genbank_cmt_metadata / column_names | The column names from metadata sheet that correspond to various Genbank database variables | Yes (string) |
genbank_src_metadata / column_names | The column names from metadata sheet that correspond to various Genbank database variables | Yes (string) |