How to run seqsender with Singularity

SOFTWARE REQUIREMENTS:

Linux (64-bit) or Mac OS X (64-bit)
Git version 2.25.1 or later
Singularity version 3.8.7 or later
Standard utilities: curl, tar, unzip

ADDITIONAL REQUIREMENTS:

See PRE-REQUISITES and REQUIREMENT FILES before proceeding to the next steps

(1) Convert seqsender Docker image into a Singularity image

There is a seqsender Docker image already built and stored on our DockerHub registry: cdcgov/seqsender-dev:latest. You can directly pull the Docker Image down from the registry, convert it into a Singularity image, and store it in a destination of your choice.

singularity build ~/singularity/seqsender.sif docker://cdcgov/seqsender-dev:latest

(2) After the Singularity image is built successfully, we can go ahead and use it to run `seqsender`.

Here is the command that shows the help messages of seqsender

mkdir ~/singularity
singularity exec ~/singularity/seqsender.sif seqsender-kickoff --help

Below is the standard out of the command.

usage: seqsender.py [-h]
                    {prep,submit,check_submission_status,template,version} ...

Automate the process of batch uploading consensus sequences and metadata to
databases of your choices

positional arguments:
  {prep,submit,check_submission_status,template,version}

optional arguments:
  -h, --help            show this help message and exit

To see the arguments required for each command, for example, the submit command, run

singularity exec ~/singularity/seqsender.sif seqsender-kickoff submit --help

usage: seqsender.py submit [-h] [--biosample] [--sra] [--genbank] [--gisaid]
                           --organism {FLU,COV} --submission_name
                           SUBMISSION_NAME --submission_dir SUBMISSION_DIR
                           --config_file CONFIG_FILE --metadata_file
                           METADATA_FILE --fasta_file FASTA_FILE [--table2asn]
                           [--gff_file GFF_FILE] [--test]

Create submission files and then batch uploading them to databases of choices.

optional arguments:
  -h, --help            show this help message and exit
  --biosample, -b       Submit to BioSample database. (default: )
  --sra, -s             Submit to SRA database. (default: )
  --genbank, -n         Submit to Genbank database. (default: )
  --gisaid, -g          Submit to GISAID database. (default: )
  --organism {FLU,COV}  Type of organism data (default: FLU)
  --submission_name SUBMISSION_NAME
                        Name of the submission (default: None)
  --submission_dir SUBMISSION_DIR
                        Directory to where all required files (such as
                        metadata, fasta, etc.) are stored (default: None)
  --config_file CONFIG_FILE
                        Config file stored in submission directory (default:
                        None)
  --metadata_file METADATA_FILE
                        Metadata file stored in submission directory (default:
                        None)
  --fasta_file FASTA_FILE
                        Fasta file stored in submission directory (default:
                        None)
  --table2asn           Whether to prepare a Table2asn submission. (default:
                        False)
  --gff_file GFF_FILE   An annotation file to add to a Table2asn submission
                        (default: None)
  --test                Whether to perform a test submission. (default: False)

(3) Submit a `test` submission

Rather than hastily jump in and submit a production submission right away, we can utilize GISAID’s and NCBI’s “TEST-SERVER” to upload a test submission first. That way submitter can familiarize themselves with the submission process prior to make a real submission.

Note: Duplicate test submissions will result in an error. Please create new sequence names each time you plan to run test submissions to avoid this issue.

Submit a `test` submission with a pre-processed dataset

Here we will go over the steps of preparing and batch uploading meta- and sequence-data to GISAID and NCBI databases using a pre-processed dataset provided with the software.

The template command will allow you to output examples of metadata and config files so you can base your submission on prior to upload a real submission. To get more help on the command, run

singularity exec ~/singularity/seqsender.sif seqsender-kickoff template --help

usage: seqsender.py template [-h] [--biosample] [--sra] [--genbank] [--gisaid]
                             --organism {FLU,COV} --submission_dir
                             SUBMISSION_DIR --submission_name SUBMISSION_NAME

Return a set of files (e.g., config file, metadata file, fasta files, etc.)
that are needed to make a submission

optional arguments:
  -h, --help            show this help message and exit
  --biosample, -b       Submit to BioSample. (default: )
  --sra, -s             Submit to SRA. (default: )
  --genbank, -n         Submit to Genbank. (default: )
  --gisaid, -g          Submit to GISAID. (default: )
  --organism {FLU,COV}  Type of organism data (default: FLU)
  --submission_dir SUBMISSION_DIR
                        Directory to where all required files (such as
                        metadata, fasta, etc.) are stored (default: None)
  --submission_name SUBMISSION_NAME
                        Name of the submission (default: None)

1. Download the pre-processed meta- and sequence-data

singularity exec ~/singularity/seqsender.sif seqsender-kickoff template \
--organism FLU \
-bsng \
--submission_dir $HOME \
--submission_name flu-test-submission

--organism specifies the type of data to download. Currently, Influenza A Virus (FLU) and SARS-COV-2 (COV) are the only two options. Additional datasets for other organisms will be provided in future updates or requests.
-bsng is a combination flag of databases: Biosample (-b or --biosample), SRA (-s or --sra), Genbank (-n or --genbank), and GISAID (-g or --gisaid). This combination flag tells seqsender to generate an unified meta- and sequence-data into one file so we can perform batch upload to all databases simultaneously.
--submission_dir is the directory where you store all of the submission histories.
--submission_name is the submission folder inside the --submission_dir directory where it contains all necessary files (such as config.yaml, metadata.csv, sequence.fasta, raw reads, etc.) in order to make a submission.

A quick look at the output files:

Here is the standard out of the command.

Generating submission template
Files are stored at: /home/snu3/flu-test-submission

Total runtime (HRS:MIN:SECS): 0:00:00.115140

2. Set up the config file – `config.yaml`

After the template is downloaded in (1), you can find config.yaml in your local $HOME/flu-test-submission directory. The config.yaml yaml file provides a brief description about the submission and contains user credentials that allow seqsender to authenticate the database prior to upload a submission.

Open that file with a text editor of your choice and fill in the appropriate information about your submission.

NOTE:

To submit to NCBI only, one can remove the GISAID Submission (b) section from the config file. Vice versa, to submit to GISAID only, just remove the NCBI Submission (a) section.
Submission_Position determines the order of the database in which we will submit first. For instance, if GISAID is set as 1, seqsender will submit to GISAID first, then after all samples are assigned with a GISAID accession number, seqsender will proceed to submit to NCBI. This order of submission ensures samples are linked correctly between the two databases after submission.
Username and Password under the NCBI Submission (b) section are the credentials used to authenticate the NCBI FTP Server (not to mistake with individual NCBI account). See PRE-REQUISITES for more details.

ADDITIONAL REQUIREMENTS:

If SRA is in your list of submitting databases, the raw reads for all samples must be provided and stored in a subfolder called raw_reads inside your submission directory of choice.
If GISAID is in your list of submitting databases, download the CLI package that associated with your organism of interest (e.g, Influenza A Virus (FLU) or SARS-COV-2 (COV)) from the GISAID platform and stored them in a subfolder called gisaid_cli inside your submission directory of choice.

A quick look of where to store the downloaded GISAID CLI package,

Important: Make sure you binary CLI package are executable. To allow executable permissions, run

chmod a+x <your_gisaid_cli_binary>

3. Upload a test submission

singularity exec ~/singularity/seqsender.sif seqsender-kickoff submit \
--organism FLU \
-bsng \
--submission_dir $HOME \
--submission_name flu-test-submission \
--config_file config.yaml \
--metadata_file metadata.csv \
--fasta_file sequence.fasta \
--test

--organism specifies the type of data to upload. Currently, Influenza A Virus (FLU) and SARS-COV-2 (COV) are the only two options.
-bsng is a combination flag of databases: Biosample (-b or --biosample), SRA (-s or --sra), Genbank (-n or --genbank), and GISAID (-g or --gisaid). This combination flag tells seqsender to prep and submit to each given database. See docker exec -it seqsender bash seqsender-kickoff submit --help for more details.
--submission_dir is the directory where you store all of the submission histories.
--submission_name is the submission folder inside the --submission_dir directory where it contains all necessary files (such as config.yaml, metadata.csv, sequence.fasta, raw reads, etc.) in order to make a submission.
--config_file is the config file inside the --submission_name directory.
--metadata_file is the metadata file inside the --submission_name directory.
--fasta_file is the fasta file inside the --submission_name directory.
--test is used to submit to “TEST-SERVER ONLY” . For production submission, please remove this flag.

A quick look at the standard output.

Creating submission files for BIOSAMPLE
Files are stored at: /home/snu3/flu-test-submission/submission_files/BIOSAMPLE

Creating submission files for SRA
Files are stored at: /home/snu3/flu-test-submission/submission_files/SRA

Creating submission files for GENBANK
Files are stored at: /home/snu3/flu-test-submission/submission_files/GENBANK

Creating submission files for GISAID
Files are stored at: /home/snu3/flu-test-submission/submission_files/GISAID

Uploading submission files to NCBI-BIOSAMPLE
Performing a 'Test' submission
If this is not a 'Test' submission, interrupts submission immediately.

Connecting to NCBI FTP Server
Submission name: flu-test-submission
Submitting 'flu-test-submission'


Uploading submission files to NCBI-SRA
Performing a 'Test' submission
If this is not a 'Test' submission, interrupts submission immediately.

Connecting to NCBI FTP Server
Submission name: flu-test-submission
Submitting 'flu-test-submission'

Uploading submission files to GISAID-FLU
Performing a 'Test' submission with Client-Id: TEST-EA76875B00C3
If this is not a 'Test' submission, interrupts submission immediately.

Submission attempt: 1
Uploading successfully
Status report is stored at: /home/snu3/flu-test-submission/submission_report_status.csv
Log file is stored at: /home/snu3/flu-test-submission/submission_files/GISAID/gisaid_upload_log_attempt_1.txt

4. Check the status of a submission

After a submission is submitted, you can routinely check the status of the submission.

singularity exec ~/singularity/seqsender.sif seqsender-kickoff check_submission_status \
--organism FLU \
--submission_dir $HOME \
--submission_name flu-test-submission \
--test

--organism specifies the type of data. Currently, Influenza A Virus (FLU) and SARS-COV-2 (COV) are the only two options.
--submission_dir is the directory where you store all of the submission histories.
--submission_name is the submission folder inside the --submission_dir directory where it contains all necessary files (such as config.yaml, metadata.csv, sequence.fasta, raw reads, etc.) in order to make a submission.
--test is used to submit to “TEST-SERVER ONLY” . For production submission, please remove this flag.

Here is a quick look at the standard output:

Checking submission status for:

Submission name: flu-test-submission
Submission organism: FLU
Submission type: Test

Submission database: GISAID
Submission status: processed-ok

Submission database: BIOSAMPLE
Pulling down report.xml
Submission status: submitted

Submission database: SRA
Pulling down report.xml
Submission status: submitted

Submission database: GENBANK
Submission status: ---

Total runtime (HRS:MIN:SECS): 0:00:08.213955

Here is a list of submission statuses and its meanings:

If at least one action has Processed-error, submission status is Processed-error

Otherwise if at least one action has Processing state, the whole submission is Processing

Otherwise, if at least one action has Queued state, the whole submission is Queued

Otherwise, if at least one action has Deleted state, the whole submission is Deleted

If all actions have Processed-ok, submission status is Processed-ok

Otherwise submission status is Submitted

Submit a `test` submission with your own dataset

Before you can perform a test submission with your own dataset, make sure you have the required files (such as config.yaml, metadata.csv, sequence.fasta, raw reads, etc.) already prepared and stored in the submission directory of your choice.

1. Assemble your meta- and sequence-data

To prep for FLU submissions, select one of the databases below for more details

BioSample
SRA
Genbank
GISAID
Multiple databases

To prep for COV submissions, select one of the databases below for more details

BioSample
SRA
Genbank
GISAID
Multiple databases

After you have finished prepping for your database of choices in (a) or (b), create a submission folder and store all your metadata and sequence files there.

Here is a quick look at the folder structure

Finally, make sure additional requirements below are met before you can proceed to the next steps.

If SRA is in your list of submitting databases, the raw reads for all samples must be provided and stored in a subfolder called raw_reads inside your submission directory of choice.
If GISAID is in your list of submitting databases, download the CLI package that associated with your organism of interest (e.g, Influenza A Virus (FLU) or SARS-COV-2 (COV)) from the GISAID platform and stored them in a subfolder called gisaid_cli inside your submission directory of choice.

Here is an example of where to place the GISAID CLI package.

Important: Make sure you binary CLI package are executable. To allow executable permissions, run

chmod a+x <your_gisaid_cli_binary>

2. Upload a test submission

After all files are (i) are prepared, we can go ahead and upload the submission

singularity exec ~/singularity/seqsender.sif seqsender-kickoff submit \
--organism FLU \
-bsng \
--submission_dir $HOME \
--submission_name flu-test-submission \
--config_file config.yaml \
--metadata_file metadata.csv \
--fasta_file sequence.fasta \
--test

--organism specifies the type of data to upload. Currently, Influenza A Virus (FLU) and SARS-COV-2 (COV) are the only two options.
-bsng is a combination flag of databases: Biosample (-b or --biosample), SRA (-s or --sra), Genbank (-n or --genbank), and GISAID (-g or --gisaid). This combination flag tells seqsender to prep and submit to each given database. See docker exec -it seqsender bash seqsender-kickoff submit --help for more details.
--submission_dir is the directory where you store all of the submission histories.
--submission_name is the submission folder inside the --submission_dir directory where it contains all necessary files (such as config.yaml, metadata.csv, sequence.fasta, raw reads, etc.) in order to make a submission.
--config_file is the config file inside the --submission_name directory.
--metadata_file is the metadata file inside the --submission_name directory.
--fasta_file is the fasta file inside the --submission_name directory.
--test is used to submit to “TEST-SERVER ONLY” . For production submission, please remove this flag.