BioSample is a database containing aggregated information pertaining to reference samples and samples stored in the European Bioinformatics Institute assay databases.
Before submitters can upload their experimental samples to
BioSample database using seqsender
, they
must ensure the requirement files (such as config.yaml
,
metadata.csv
, sequence.fasta
,
raw reads
, etc.) are already prepared ahead of time and
stored them in a submission folder of choice (e.g.,
submission_name
) within a parent submission directory
(e.g., submission_dir
). That way seqsender
will be able to scoop up the necessary files in that folder, generate
submission files, and then batch uploading them to the submitting
database of choices.
yaml
formatcsv
formatA quick look of where to store all of the requirement files
Config file is a yaml file that provides a brief description about
the submission and contains user credentials that allow
seqsender
to authenticate the database prior to upload a
submission.
NOTE:
1
, seqsender will submit
to GISAID first, then after all samples are assigned with a GISAID
accession number, seqsender will proceed to
submit to NCBI. This order of submission ensures samples are linked
correctly between the two databases. The metadata worksheet is a comma-delimited (csv) file that contains required attributes that are useful for the rapid analysis and trace back of Influenza A Virus or SARS-COV-2 cases.
Here is a short description about the fields in the metadata worksheet.
Column_name | Description |
---|---|
sequence_name | Sequence identifier used in fasta file. This is used to create the fasta file for Genbank or GISAID. |
organism | The most descriptive organism name for the samples. If relevant, you can search the organism name in the NCBI Taxonomy database. For FLU, organism must be “Influenza A Virus”. For COV, organism must be “Severe acute respiratory syndrome coronavirus 2”. |
collection_date |
The date on which the sample was collected; must be in the ISO format:
YYYY-MM-DD. For example: 2020-03-25 |
authors | Citing authors. List of Last, First Middle, suffix separated by a semicolon “;” E.g.: “Baker, Howard Henry, Jr.; Powell, Earl Alexander, III.;” |
ncbi-spuid | Submitter Provided Unique Identifiers. This is used to report back assigned accessions as well as for cross-linking objects within submission. |
ncbi-spuid_namespace | If SPUID is used, spuid_namespace has to be provided. The values of spuid_namespace are from controlled vocabulary and need to be coordinated with NCBI prior to submission. |
ncbi-bioproject | Associated BioProject accession number. For example: PRJNA217342 |
bs-description | A brief description about the sample, e.g. SARS-CoV-2 Sequencing Baseline Constellation. |
bs-collected_by | Name of persons or institute who collected the sample. |
bs-geo_loc_name | Geographical origin of the sample; use the appropriate name from this list. Use a colon to separate the country or ocean from more detailed information about the location, eg “Canada: Vancouver” or “Germany: halfway down Zugspitze, Alps”. Entering multiple localities in one attribute is not allowed. |
bs-host | The natural (as opposed to laboratory) host to the organism from which the sample was obtained. Use the full taxonomic name, eg, Homo sapiens. |
bs-host_disease | Name of relevant disease, e.g. Salmonella gastroenteritis. Controlled vocabulary, please see Human Disease Ontology or MeSH |
bs-isolate | Identification or description of the specific individual from which this sample was obtained. |
bs-isolation_source | Describes the physical, environmental and/or local geographical source of the biological sample from which the sample was derived. |
bs-lat_lon | The geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format “d[d.dddd] N|S d[dd.dddd] W|E”, eg, 38.98 N 77.11 W |
NOTE: The prefix of “bs-” is used to identity attributes for BioSample submissions
To include additional attributes to BioSample
submissions, just append bs-
in front of the desired
attributes, e.g. bs-host_age, bs-host_sex
, etc. See Pathogen.cl.1.0
package for more attributes.
You are now ready to install seqsender
and batch upload
your submission
Any questions or issues? Please report them on our Github issue tracker.