GISAID, short for the Global Initiative on Sharing All Influenza Data, is an organization that manages a restricted-access database containing genomic sequence data of select virus, primarily influenza viruses. The database has expanded to include the coronavirus responsible for the COVID-19 pandemic as well as other pathogens.
For all GISAID submissions, seqsender
makes use of
GISAID’s Command Line Interface Tools (CLIs) to batch uploading meta-
and sequence-data to their databases. Prior to perform a batch upload to
EpiCoV database, submitters must
gisaid_cli
within a submission directory
of choice (e.g., submission_dir
).After submitters had obtained the GISAID CLI for
EpiCoV, they must also prepare the requirement files
(such as config.yaml
, metadata.csv
,
sequence.fasta
, raw reads
, etc.) and store
them in a submission folder of choice (e.g.,
submission_name
) within a parent submission directory
(e.g., submission_dir
). That way seqsender
will be able to scoop up the necessary files in that folder, generate
submission files, and then batch uploading them to the submitting
database of choices.
Here is a list of the requirement files and where to store them:
yaml
formatfasta
formatcsv
formatConfig file is a yaml file that provides a brief description about
the submission and contains user credentials that allow
seqsender
to authenticate the database prior to upload a
submission.
NOTE:
1
, seqsender will submit
to GISAID first, then after all samples are assigned with a GISAID
accession number, seqsender will proceed to
submit to NCBI. This order of submission ensures samples are linked
correctly between the two databases. Fasta file contains nucleotide sequences for all samples. See Genbank Fasta Format for more details.
The metadata worksheet is a comma-delimited (csv) file that contains required attributes that are useful for the rapid analysis and trace back of SARS-COV-2 cases.
Here is a short description about the fields in the metadata worksheet.
Column_name | Description |
---|---|
sequence_name | Sequence identifier used in fasta file. This is used to create the fasta file for Genbank or GISAID. |
organism | The most descriptive organism name for the samples. If relevant, you can search the organism name in the NCBI Taxonomy database. For FLU, organism must be “Influenza A Virus”. For COV, organism must be “Severe acute respiratory syndrome coronavirus 2”. |
collection_date |
The date on which the sample was collected; must be in the ISO format:
YYYY-MM-DD. For example: 2020-03-25 |
authors | Citing authors. List of Last, First Middle, suffix separated by a semicolon “;” E.g.: “Baker, Howard Henry, Jr.; Powell, Earl Alexander, III.;” |
gs-virus_name |
For example: hCoV-19/Country/SampleID/YYYY There are four parts delineated by the forward slash “/” character:
|
gs-type | For hCoV-19, this will always be “betacoronavirus”. |
gs-passage | “Original” if the sample was sequenced directly from swabs, otherwise add the name of the cell line (e.g., “Vero”) used to culture the specimen. |
gs-location | Format as “Continent / Country / Region / Sub-region” |
gs-host | For clinical samples, this is “Human”. Otherwise add the species name of the organism from which the sample was originally sourced. |
gs-gender | Synonym for “Biological sex”. Should be “Female”, “Male”, or “Other” |
gs-patient_age | Age in years of the person from whom the specimen was collected. May take format other than integer years, for example, “0.5” (i.e., 6 months), “5 days”, “7 months”. If units are not given, they are assumed in years. |
gs-patient_status | E.g., “Hospitalized”, “Released”, “Live”, “Deceased” |
gs-seq_technology | Add the sequencer brand and model. See a list of options here. |
gs-orig_lab | Full name of laboratory from where sample originated. |
gs-orig_lab_addr | Complete building address of laboratory from where sample originated. |
gs-subm_lab | Full name of laboratory submitting this record to GISAID. See a list of options here. |
gs-subm_lab_addr | Complete building address of the submitting laboratory. |
NOTE: The prefix of “gs-” is used to identity attributes for GISAID submissions.
To include additional attributes to EpiCoV
submissions, just append gs-
in front of the desired
attributes. Here is a list of optional attributes:
Column_name | Description |
---|---|
add_location | e.g. Cruise Ship, Convention, Live animal market |
add_host_info | e.g. Patient infected while traveling in |
sampling_strategy | e.g. Sentinel surveillance (ILI), Sentinel surveillance (ARI), Sentinel surveillance (SARI), Non-sentinel-surveillance (hospital), Non-sentinel-surveillance (GP network), Longitudinal sampling on same patient(s), S gene dropout |
specimen | e.g. Sputum, Alveolar lavage fluid, Oro-pharyngeal swab, Blood, Tracheal swab, Urine, Stool, Cloakal swab, Organ, Feces, Other |
outbreak | Date, Location e.g. type of gathering, Family cluster, etc. |
last_vaccinated | provide details if applicable |
treatment | Include drug name, dosage |
assembly_method | e.g. CLC Genomics Workbench 12, Geneious 10.2.4, SPAdes/MEGAHIT v1.2.9, UGENE v. 33, etc. |
coverage | e.g. 70x, 1,000x, 10,000x (average) |
provider_sample_id | Sample ID given by originating laboratory |
subm_sample_id | Sample ID given by the submitting laboratory |
consortium | Sequencing consortium the submitting lab is affiliated to |
comment | Comment |
comment_type | Comment icon |
You are now ready to install seqsender
and batch upload
your submission
Any questions or issues? Please report them on our Github issue tracker.