SOFTWARE REQUIREMENTS:
- Linux (64-bit) or Mac OS X (64-bit)
- Git version 2.25.1 or later
- Singularity version 3.8.7 or later
- Standard utilities: curl, tar, unzip
ADDITIONAL REQUIREMENTS:
See PRE-REQUISITES
and REQUIREMENT
FILES before proceeding to the next steps
(1) Convert seqsender Docker image into a Singularity image
There is a seqsender
Docker image already built and
stored on our DockerHub registry:
cdcgov/seqsender-dev:latest. You can directly pull the
Docker Image down from the registry, convert it into a Singularity
image, and store it in a destination of your choice.
singularity build ~/singularity/seqsender.sif docker://cdcgov/seqsender-dev:latest
(2) After the Singularity image is built successfully, we can go
ahead and use it to run seqsender
.
Here is the command that shows the help messages of
seqsender
mkdir ~/singularity
singularity exec ~/singularity/seqsender.sif seqsender-kickoff --help
Below is the standard out of the command.
usage: seqsender.py [-h]
{prep,submit,check_submission_status,template,version} ...
Automate the process of batch uploading consensus sequences and metadata to
databases of your choices
positional arguments:
{prep,submit,check_submission_status,template,version}
optional arguments:
-h, --help show this help message and exit
To see the arguments required for each command, for example, the
submit
command, run
singularity exec ~/singularity/seqsender.sif seqsender-kickoff submit --help
usage: seqsender.py submit [-h] [--biosample] [--sra] [--genbank] [--gisaid]
--organism {FLU,COV} --submission_name
SUBMISSION_NAME --submission_dir SUBMISSION_DIR
--config_file CONFIG_FILE --metadata_file
METADATA_FILE --fasta_file FASTA_FILE [--table2asn]
[--gff_file GFF_FILE] [--test]
Create submission files and then batch uploading them to databases of choices.
optional arguments:
-h, --help show this help message and exit
--biosample, -b Submit to BioSample database. (default: )
--sra, -s Submit to SRA database. (default: )
--genbank, -n Submit to Genbank database. (default: )
--gisaid, -g Submit to GISAID database. (default: )
--organism {FLU,COV} Type of organism data (default: FLU)
--submission_name SUBMISSION_NAME
Name of the submission (default: None)
--submission_dir SUBMISSION_DIR
Directory to where all required files (such as
metadata, fasta, etc.) are stored (default: None)
--config_file CONFIG_FILE
Config file stored in submission directory (default:
None)
--metadata_file METADATA_FILE
Metadata file stored in submission directory (default:
None)
--fasta_file FASTA_FILE
Fasta file stored in submission directory (default:
None)
--table2asn Whether to prepare a Table2asn submission. (default:
False)
--gff_file GFF_FILE An annotation file to add to a Table2asn submission
(default: None)
--test Whether to perform a test submission. (default: False)
(3) Submit a test
submission
Rather than hastily jump in and submit a production
submission right away, we can utilize GISAID’s and NCBI’s
“TEST-SERVER” to upload a test
submission
first. That way submitter can familiarize themselves with the submission
process prior to make a real submission.
Note: Duplicate test submissions will result in an
error. Please create new sequence names each time you plan to run test
submissions to avoid this issue.
Here we will go over the steps of preparing and batch uploading meta-
and sequence-data to GISAID and NCBI databases using a pre-processed
dataset provided with the software.
The template
command will allow you to output examples
of metadata and config files so you can base your submission on prior to
upload a real submission. To get more help on the command, run
singularity exec ~/singularity/seqsender.sif seqsender-kickoff template --help
usage: seqsender.py template [-h] [--biosample] [--sra] [--genbank] [--gisaid]
--organism {FLU,COV} --submission_dir
SUBMISSION_DIR --submission_name SUBMISSION_NAME
Return a set of files (e.g., config file, metadata file, fasta files, etc.)
that are needed to make a submission
optional arguments:
-h, --help show this help message and exit
--biosample, -b Submit to BioSample. (default: )
--sra, -s Submit to SRA. (default: )
--genbank, -n Submit to Genbank. (default: )
--gisaid, -g Submit to GISAID. (default: )
--organism {FLU,COV} Type of organism data (default: FLU)
--submission_dir SUBMISSION_DIR
Directory to where all required files (such as
metadata, fasta, etc.) are stored (default: None)
--submission_name SUBMISSION_NAME
Name of the submission (default: None)
singularity exec ~/singularity/seqsender.sif seqsender-kickoff template \
--organism FLU \
-bsng \
--submission_dir $HOME \
--submission_name flu-test-submission
-
--organism
specifies the type of data
to download. Currently, Influenza A Virus (FLU) and
SARS-COV-2 (COV) are the only two options. Additional
datasets for other organisms will be provided in future updates or
requests.
-
-bsng
is a combination flag of
databases: Biosample (-b
or
--biosample
), SRA
(-s
or --sra
),
Genbank (-n
or
--genbank
), and GISAID
(-g
or --gisaid
). This combination
flag tells seqsender
to generate an unified meta- and
sequence-data into one file so we can perform batch upload to all
databases simultaneously.
-
--submission_dir
is the directory
where you store all of the submission histories.
-
--submission_name
is the submission
folder inside the --submission_dir
directory where it
contains all necessary files (such as config.yaml,
metadata.csv, sequence.fasta, raw reads,
etc.) in order to make a submission.
A quick look at the output files:
Here is the standard out of the command.
Generating submission template
Files are stored at: /home/snu3/flu-test-submission
Total runtime (HRS:MIN:SECS): 0:00:00.115140
2. Set up the config file – config.yaml
After the template is downloaded in (1)
, you can find
config.yaml
in your local
$HOME/flu-test-submission
directory. The
config.yaml
yaml file provides a brief description about
the submission and contains user credentials that allow
seqsender
to authenticate the database prior to upload a
submission.
Open that file with a text editor of your choice and fill in the
appropriate information about your submission.
NOTE:
- To submit to NCBI only, one can remove the GISAID Submission
(b) section from the config file. Vice versa, to submit to
GISAID only, just remove the NCBI Submission (a)
section.
-
Submission_Position determines the order of the
database in which we will submit first. For instance, if GISAID is set
as
1
, seqsender
will submit to GISAID first,
then after all samples are assigned with a GISAID accession number,
seqsender
will proceed to submit to NCBI. This order of
submission ensures samples are linked correctly between the two
databases after submission.
-
Username and Password under the
NCBI Submission (b) section are the credentials used to
authenticate the NCBI FTP Server (not to mistake with
individual NCBI account). See PRE-REQUISITES
for more details.
ADDITIONAL REQUIREMENTS:
- If SRA is in your list of submitting databases, the
raw reads for all samples must be provided and stored in a subfolder
called
raw_reads
inside your submission directory of
choice.
- If GISAID is in your list of submitting databases,
download the CLI package that associated with your organism of interest
(e.g,
Influenza A
Virus (FLU) or
SARS-COV-2
(COV)) from the GISAID platform and stored them in a subfolder
called
gisaid_cli
inside your submission directory of
choice.
A quick look of where to store the downloaded GISAID
CLI package,
Important: Make sure you binary CLI package are
executable. To allow executable permissions, run
chmod a+x <your_gisaid_cli_binary>
3. Upload a test submission
singularity exec ~/singularity/seqsender.sif seqsender-kickoff submit \
--organism FLU \
-bsng \
--submission_dir $HOME \
--submission_name flu-test-submission \
--config_file config.yaml \
--metadata_file metadata.csv \
--fasta_file sequence.fasta \
--test
-
--organism
specifies the type of data
to upload. Currently, Influenza A Virus (FLU) and
SARS-COV-2 (COV) are the only two options.
-
-bsng
is a combination flag of
databases: Biosample (-b
or
--biosample
), SRA
(-s
or --sra
),
Genbank (-n
or
--genbank
), and GISAID
(-g
or --gisaid
). This combination
flag tells seqsender
to prep and submit to each given
database. See
docker exec -it seqsender bash seqsender-kickoff submit --help
for more details.
-
--submission_dir
is the directory
where you store all of the submission histories.
-
--submission_name
is the submission
folder inside the --submission_dir
directory where it
contains all necessary files (such as config.yaml,
metadata.csv, sequence.fasta, raw reads,
etc.) in order to make a submission.
-
--config_file
is the config file
inside the --submission_name
directory.
-
--metadata_file
is the metadata file
inside the --submission_name
directory.
-
--fasta_file
is the fasta file inside
the --submission_name
directory.
-
--test
is used to submit to
“TEST-SERVER ONLY” . For production
submission, please remove this flag.
A quick look at the standard output.
Creating submission files for BIOSAMPLE
Files are stored at: /home/snu3/flu-test-submission/submission_files/BIOSAMPLE
Creating submission files for SRA
Files are stored at: /home/snu3/flu-test-submission/submission_files/SRA
Creating submission files for GENBANK
Files are stored at: /home/snu3/flu-test-submission/submission_files/GENBANK
Creating submission files for GISAID
Files are stored at: /home/snu3/flu-test-submission/submission_files/GISAID
Uploading submission files to NCBI-BIOSAMPLE
Performing a 'Test' submission
If this is not a 'Test' submission, interrupts submission immediately.
Connecting to NCBI FTP Server
Submission name: flu-test-submission
Submitting 'flu-test-submission'
Uploading submission files to NCBI-SRA
Performing a 'Test' submission
If this is not a 'Test' submission, interrupts submission immediately.
Connecting to NCBI FTP Server
Submission name: flu-test-submission
Submitting 'flu-test-submission'
Uploading submission files to GISAID-FLU
Performing a 'Test' submission with Client-Id: TEST-EA76875B00C3
If this is not a 'Test' submission, interrupts submission immediately.
Submission attempt: 1
Uploading successfully
Status report is stored at: /home/snu3/flu-test-submission/submission_report_status.csv
Log file is stored at: /home/snu3/flu-test-submission/submission_files/GISAID/gisaid_upload_log_attempt_1.txt
4. Check the status of a submission
After a submission is submitted, you can routinely check the status
of the submission.
singularity exec ~/singularity/seqsender.sif seqsender-kickoff check_submission_status \
--organism FLU \
--submission_dir $HOME \
--submission_name flu-test-submission \
--test
-
--organism
specifies the type of data.
Currently, Influenza A Virus (FLU) and
SARS-COV-2 (COV) are the only two options.
-
--submission_dir
is the directory
where you store all of the submission histories.
-
--submission_name
is the submission
folder inside the --submission_dir
directory where it
contains all necessary files (such as config.yaml,
metadata.csv, sequence.fasta, raw reads,
etc.) in order to make a submission.
-
--test
is used to submit to
“TEST-SERVER ONLY” . For production
submission, please remove this flag.
Here is a quick look at the standard output:
Checking submission status for:
Submission name: flu-test-submission
Submission organism: FLU
Submission type: Test
Submission database: GISAID
Submission status: processed-ok
Submission database: BIOSAMPLE
Pulling down report.xml
Submission status: submitted
Submission database: SRA
Pulling down report.xml
Submission status: submitted
Submission database: GENBANK
Submission status: ---
Total runtime (HRS:MIN:SECS): 0:00:08.213955
Here is a list of submission statuses and its meanings:
- If at least one action has Processed-error,
submission status is Processed-error
- Otherwise if at least one action has Processing
state, the whole submission is Processing
- Otherwise, if at least one action has Queued state,
the whole submission is Queued
- Otherwise, if at least one action has Deleted
state, the whole submission is Deleted
- If all actions have Processed-ok, submission status
is Processed-ok
- Otherwise submission status is Submitted
Before you can perform a test
submission with your own
dataset, make sure you have the required files (such as
config.yaml, metadata.csv,
sequence.fasta, raw reads, etc.)
already prepared and stored in the submission directory of your
choice.
- To prep for FLU submissions, select one of the databases below for
more details
BioSample
SRA
Genbank
GISAID
Multiple
databases
- To prep for COV submissions, select one of the databases below for
more details
BioSample
SRA
Genbank
GISAID
Multiple
databases
After you have finished prepping for your database of choices in
(a)
or (b)
, create a submission folder and
store all your metadata and sequence files there.
Here is a quick look at the folder structure
Finally, make sure additional requirements below are met before you
can proceed to the next steps.
- If SRA is in your list of submitting databases, the
raw reads for all samples must be provided and stored in a subfolder
called
raw_reads
inside your submission directory of
choice.
- If GISAID is in your list of submitting databases,
download the CLI package that associated with your organism of interest
(e.g,
Influenza A
Virus (FLU) or
SARS-COV-2
(COV)) from the GISAID platform and stored them in a subfolder
called
gisaid_cli
inside your submission directory of
choice.
Here is an example of where to place the GISAID CLI
package.
Important: Make sure you binary CLI package are
executable. To allow executable permissions, run
chmod a+x <your_gisaid_cli_binary>
2. Upload a test submission
After all files are (i) are prepared, we can go ahead and upload the
submission
singularity exec ~/singularity/seqsender.sif seqsender-kickoff submit \
--organism FLU \
-bsng \
--submission_dir $HOME \
--submission_name flu-test-submission \
--config_file config.yaml \
--metadata_file metadata.csv \
--fasta_file sequence.fasta \
--test
-
--organism
specifies the type of data
to upload. Currently, Influenza A Virus (FLU) and
SARS-COV-2 (COV) are the only two options.
-
-bsng
is a combination flag of
databases: Biosample (-b
or
--biosample
), SRA
(-s
or --sra
),
Genbank (-n
or
--genbank
), and GISAID
(-g
or --gisaid
). This combination
flag tells seqsender
to prep and submit to each given
database. See
docker exec -it seqsender bash seqsender-kickoff submit --help
for more details.
-
--submission_dir
is the directory
where you store all of the submission histories.
-
--submission_name
is the submission
folder inside the --submission_dir
directory where it
contains all necessary files (such as config.yaml,
metadata.csv, sequence.fasta, raw reads,
etc.) in order to make a submission.
-
--config_file
is the config file
inside the --submission_name
directory.
-
--metadata_file
is the metadata file
inside the --submission_name
directory.
-
--fasta_file
is the fasta file inside
the --submission_name
directory.
-
--test
is used to submit to
“TEST-SERVER ONLY” . For production
submission, please remove this flag.
A quick look at the standard output.
Creating submission files for BIOSAMPLE
Files are stored at: /home/snu3/flu-test-submission/submission_files/BIOSAMPLE
Creating submission files for SRA
Files are stored at: /home/snu3/flu-test-submission/submission_files/SRA
Creating submission files for GENBANK
Files are stored at: /home/snu3/flu-test-submission/submission_files/GENBANK
Creating submission files for GISAID
Files are stored at: /home/snu3/flu-test-submission/submission_files/GISAID
Uploading submission files to NCBI-BIOSAMPLE
Performing a 'Test' submission
If this is not a 'Test' submission, interrupts submission immediately.
Connecting to NCBI FTP Server
Submission name: flu-test-submission
Submitting 'flu-test-submission'
Uploading submission files to NCBI-SRA
Performing a 'Test' submission
If this is not a 'Test' submission, interrupts submission immediately.
Connecting to NCBI FTP Server
Submission name: flu-test-submission
Submitting 'flu-test-submission'
Uploading submission files to GISAID-FLU
Performing a 'Test' submission with Client-Id: TEST-EA76875B00C3
If this is not a 'Test' submission, interrupts submission immediately.
Submission attempt: 1
Uploading successfully
Status report is stored at: /home/snu3/flu-test-submission/submission_report_status.csv
Log file is stored at: /home/snu3/flu-test-submission/submission_files/GISAID/gisaid_upload_log_attempt_1.txt
3. Check the status of a submission
After a submission is submitted, you can routinely check the status
of the submission.
singularity exec ~/singularity/seqsender.sif seqsender-kickoff check_submission_status \
--organism FLU \
--submission_dir $HOME \
--submission_name flu-test-submission \
--test
-
--organism
specifies the type of data.
Currently, Influenza A Virus (FLU) and
SARS-COV-2 (COV) are the only two options.
-
--submission_dir
is the directory
where you store all of the submission histories.
-
--submission_name
is the submission
folder inside the --submission_dir
directory where it
contains all necessary files (such as config.yaml,
metadata.csv, sequence.fasta, raw reads,
etc.) in order to make a submission.
-
--test
is used to submit to
“TEST-SERVER ONLY” . For production
submission, please remove this flag.
Here is a quick look at the standard output:
Checking submission status for:
Submission name: flu-test-submission
Submission organism: FLU
Submission type: Test
Submission database: GISAID
Submission status: processed-ok
Submission database: BIOSAMPLE
Pulling down report.xml
Submission status: submitted
Submission database: SRA
Pulling down report.xml
Submission status: submitted
Submission database: GENBANK
Submission status: ---
Total runtime (HRS:MIN:SECS): 0:00:08.213955
Here is a list of submission statuses and its meanings:
- If at least one action has Processed-error,
submission status is Processed-error
- Otherwise if at least one action has Processing
state, the whole submission is Processing
- Otherwise, if at least one action has Queued state,
the whole submission is Queued
- Otherwise, if at least one action has Deleted
state, the whole submission is Deleted
- If all actions have Processed-ok, submission status
is Processed-ok
- Otherwise submission status is Submitted
Any questions or issues? Please report them on our
Github
issue tracker.