How to run seqsender locally

SOFTWARE REQUIREMENTS:

Linux (64-bit) or Mac OS X (64-bit)
Git version 2.25.1 or later
Standard utilities: curl, tar, unzip

ADDITIONAL REQUIREMENTS:

See PRE-REQUISITES and REQUIREMENT FILES before proceeding to the next steps

Micromamba Installation

Here we recommend using micromamba to set up a virtual environment to run seqsender. Micromamba is a tiny, statically linked C++ reimplementation of mamba which is an alternative to conda. The tool works as a standalone package manager that supports a subset of all mamba or conda commands, but it also has its own separate command line interfaces. For more information, visit micromamba documentation.

To manually install, download and unzip the executable from the official conda-forge package to your $HOME directory using tar.

cd $HOME

LINUX

# Linux Intel (x86_64):
curl -Ls https://micro.mamba.pm/api/micromamba/linux-64/latest | tar -xvj bin/micromamba
# Linux ARM64:
curl -Ls https://micro.mamba.pm/api/micromamba/linux-aarch64/latest | tar -xvj bin/micromamba
# Linux Power:
curl -Ls https://micro.mamba.pm/api/micromamba/linux-ppc64le/latest | tar -xvj bin/micromamba

macOS

# macOS Intel (x86_64):
curl -Ls https://micro.mamba.pm/api/micromamba/osx-64/latest | tar -xvj bin/micromamba
# macOS Silicon/M1 (ARM64):
curl -Ls https://micro.mamba.pm/api/micromamba/osx-arm64/latest | tar -xvj bin/micromamba

After the extraction is completed, you can find the executable at $HOME/bin/micromamba

To quickly use micromamba, you can simply run

export MAMBA_ROOT_PREFIX="$HOME/micromamba"
eval "$($HOME/bin/micromamba shell hook -s posix)"

To persist using micromamba, you can append the following script to your .bashrc (or .zshrc)

# >>> mamba initialize >>>
export MAMBA_EXE="$HOME/bin/micromamba";
export MAMBA_ROOT_PREFIX="$HOME/micromamba";
__mamba_setup="$("$MAMBA_EXE" shell hook --shell bash --root-prefix "$MAMBA_ROOT_PREFIX" 2> /dev/null)"
if [ $? -eq 0 ]; then
    eval "$__mamba_setup"
else
    alias micromamba="$MAMBA_EXE"  # Fallback on help from mamba activate
fi
unset __mamba_setup
# <<< mamba initialize <<<

To check the current version of micromamba

micromamba --version
1.5.6

Set up a `micromamba` environment

Clone this repository to your $HOME directory

cd $HOME
git clone https://github.com/CDCgov/seqsender.git

CD to seqsender folder where the env.yaml file is stored. Let’s create a virtual environment named mamba that contains all dependencies needed to run seqsender from the source file.

cd seqsender
micromamba create --name mamba --file env.yaml

Activate the named environment – mamba

micromamba activate mamba

Run `seqsender` within the `mamba` environment

First, let’s look a list of commands in seqsender. Currently, there are five implemented commands in seqsender: prep, submit, check_submission_status, template, version.

python seqsender.py --help

To see the arguments required for each command, for example, the submit command, run

python seqsender.py submit --help

Submit a `test` submission

Rather than hastily jump in and submit a production submission right away, we can utilize GISAID’s and NCBI’s “TEST-SERVER” to upload a test submission first. That way submitter can familiarize themselves with the submission process prior to make a real submission.

Note: Duplicate test submissions will result in an error. Please create new sequence names each time you plan to run test submissions to avoid this issue.

Submit a `test` submission with a pre-processed dataset

Here we will go over the steps of preparing and batch uploading meta- and sequence-data to GISAID and NCBI databases using a pre-processed dataset provided with the software.

The template command will allow you to output examples of metadata and config files so you can base your submission on prior to upload a real submission. To get more help on the command, run

python seqsender.py template --help

1. Download the pre-processed meta- and sequence-data

python seqsender.py template \
--organism FLU \
-bsng \
--submission_dir $HOME \
--submission_name flu-test-submission

--organism specifies the type of data to download. Currently, Influenza A Virus (FLU) and SARS-COV-2 (COV) are the only two options. Additional datasets for other organisms will be provided in future updates or requests.
-bsng is a combination flag of databases: Biosample (-b or --biosample), SRA (-s or --sra), Genbank (-n or --genbank), and GISAID (-g or --gisaid). This combination flag tells seqsender to generate an unified meta- and sequence-data into one file so we can perform batch upload to all databases simultaneously.
--submission_dir is the directory where you store all of the submission histories.
--submission_name is the submission folder inside the --submission_dir directory where it contains all necessary files (such as config.yaml, metadata.csv, sequence.fasta, raw reads, etc.) in order to make a submission.

A quick look at the output files:

Here is the standard out of the command.

Generating submission template
Files are stored at: /home/snu3/flu-test-submission

Total runtime (HRS:MIN:SECS): 0:00:00.115140

2. Set up the config file – `config.yaml`

After the template is downloaded in (1), you can find config.yaml in your local $HOME/flu-test-submission directory. The config.yaml yaml file provides a brief description about the submission and contains user credentials that allow seqsender to authenticate the database prior to upload a submission.

Open that file with a text editor of your choice and fill in the appropriate information about your submission.

NOTE:

To submit to NCBI only, one can remove the GISAID Submission (b) section from the config file. Vice versa, to submit to GISAID only, just remove the NCBI Submission (a) section.
Submission_Position determines the order of the database in which we will submit first. For instance, if GISAID is set as 1, seqsender will submit to GISAID first, then after all samples are assigned with a GISAID accession number, seqsender will proceed to submit to NCBI. This order of submission ensures samples are linked correctly between the two databases after submission.
Username and Password under the NCBI Submission (b) section are the credentials used to authenticate the NCBI FTP Server (not to mistake with individual NCBI account). See PRE-REQUISITES for more details.

ADDITIONAL REQUIREMENTS:

If SRA is in your list of submitting databases, the raw reads for all samples must be provided and stored in a subfolder called raw_reads inside your submission directory of choice.
If GISAID is in your list of submitting databases, download the CLI package that associated with your organism of interest (e.g, Influenza A Virus (FLU) or SARS-COV-2 (COV)) from the GISAID platform and stored them in a subfolder called gisaid_cli inside your submission directory of choice.

A quick look of where to store the downloaded GISAID CLI package,

Important: Make sure you binary CLI package are executable. To allow executable permissions, run

chmod a+x <your_gisaid_cli_binary>

3. Upload a test submission

python seqsender.py submit \
--organism FLU \
-bsng \
--submission_dir $HOME \
--submission_name flu-test-submission \
--config_file config.yaml \
--metadata_file metadata.csv \
--fasta_file sequence.fasta \
--test

--organism specifies the type of data to upload. Currently, Influenza A Virus (FLU) and SARS-COV-2 (COV) are the only two options.
-bsng is a combination flag of databases: Biosample (-b or --biosample), SRA (-s or --sra), Genbank (-n or --genbank), and GISAID (-g or --gisaid). This combination flag tells seqsender to prep and submit to each given database. See python seqsender.py submit --help for more details.
--submission_dir is the directory where you store all of the submission histories.
--submission_name is the submission folder inside the --submission_dir directory where it contains all necessary files (such as config.yaml, metadata.csv, sequence.fasta, raw reads, etc.) in order to make a submission.
--config_file is the config file inside the --submission_name directory.
--metadata_file is the metadata file inside the --submission_name directory.
--fasta_file is the fasta file inside the --submission_name directory.
--test is used to submit to “TEST-SERVER ONLY” . For production submission, please remove this flag.

A quick look at the standard output.

Creating submission files for BIOSAMPLE
Files are stored at: /home/snu3/flu-test-submission/submission_files/BIOSAMPLE

Creating submission files for SRA
Files are stored at: /home/snu3/flu-test-submission/submission_files/SRA

Creating submission files for GENBANK
Files are stored at: /home/snu3/flu-test-submission/submission_files/GENBANK

Creating submission files for GISAID
Files are stored at: /home/snu3/flu-test-submission/submission_files/GISAID

Uploading submission files to NCBI-BIOSAMPLE
Performing a 'Test' submission
If this is not a 'Test' submission, interrupts submission immediately.

Connecting to NCBI FTP Server
Submission name: flu-test-submission
Submitting 'flu-test-submission'

Uploading submission files to NCBI-SRA
Performing a 'Test' submission
If this is not a 'Test' submission, interrupts submission immediately.

Connecting to NCBI FTP Server
Submission name: flu-test-submission
Submitting 'flu-test-submission'

Uploading submission files to GISAID-FLU
Performing a 'Test' submission with Client-Id: TEST-EA76875B00C3
If this is not a 'Test' submission, interrupts submission immediately.

Submission attempt: 1
Uploading successfully
Status report is stored at: /home/snu3/flu-test-submission/submission_report_status.csv
Log file is stored at: /home/snu3/flu-test-submission/submission_files/GISAID/gisaid_upload_log_attempt_1.txt

4. Check the status of a submission

After a submission is submitted, you can routinely check the status of the submission.

python seqsender.py check_submission_status \
--organism FLU \
--submission_dir $HOME \
--submission_name flu-test-submission \
--test

--organism specifies the type of data. Currently, Influenza A Virus (FLU) and SARS-COV-2 (COV) are the only two options.
--submission_dir is the directory where you store all of the submission histories.
--submission_name is the submission folder inside the --submission_dir directory where it contains all necessary files (such as config.yaml, metadata.csv, sequence.fasta, raw reads, etc.) in order to make a submission.
--test is used to submit to “TEST-SERVER ONLY” . For production submission, please remove this flag.

Here is a quick look at the standard output:

Checking submission status for:

Submission name: flu-test-submission
Submission organism: FLU
Submission type: Test

Submission database: GISAID
Submission status: processed-ok

Submission database: BIOSAMPLE
Pulling down report.xml
Submission status: submitted

Submission database: SRA
Pulling down report.xml
Submission status: submitted

Submission database: GENBANK
Submission status: ---

Total runtime (HRS:MIN:SECS): 0:00:08.213955

Here is a list of submission statuses and its meanings:

If at least one action has Processed-error, submission status is Processed-error

Otherwise if at least one action has Processing state, the whole submission is Processing

Otherwise, if at least one action has Queued state, the whole submission is Queued

Otherwise, if at least one action has Deleted state, the whole submission is Deleted

If all actions have Processed-ok, submission status is Processed-ok

Otherwise submission status is Submitted

Submit a `test` submission with your own dataset

Before you can perform a test submission with your own dataset, make sure you have the required files (such as config.yaml, metadata.csv, sequence.fasta, raw reads, etc.) already prepared and stored in the submission directory of your choice.

1. Assemble your meta- and sequence-data

To prep for FLU submissions, select one of the databases below for more details

BioSample
SRA
Genbank
GISAID
Multiple databases

To prep for COV submissions, select one of the databases below for more details

BioSample
SRA
Genbank
GISAID
Multiple databases

After you have finished prepping for your database of choices in (a) or (b), create a submission folder and store all your metadata and sequence files there.

Here is a quick look at the folder structure

Finally, make sure additional requirements below are met before you can proceed to the next steps.

If SRA is in your list of submitting databases, the raw reads for all samples must be provided and stored in a subfolder called raw_reads inside your submission directory of choice.
If GISAID is in your list of submitting databases, download the CLI package that associated with your organism of interest (e.g, Influenza A Virus (FLU) or SARS-COV-2 (COV)) from the GISAID platform and stored them in a subfolder called gisaid_cli inside your submission directory of choice.

Here is an example of where to place the GISAID CLI package.

Important: Make sure you binary CLI package are executable. To allow executable permissions, run

chmod a+x <your_gisaid_cli_binary>

2. Upload a test submission

After all files are (i) are prepared, we can go ahead and upload the submission

python seqsender.py submit \
--organism FLU \
-bsng \
--submission_dir $HOME \
--submission_name flu-test-submission \
--config_file config.yaml \
--metadata_file metadata.csv \
--fasta_file sequence.fasta \
--test

--organism specifies the type of data to upload. Currently, Influenza A Virus (FLU) and SARS-COV-2 (COV) are the only two options.
-bsng is a combination flag of databases: Biosample (-b or --biosample), SRA (-s or --sra), Genbank (-n or --genbank), and GISAID (-g or --gisaid). This combination flag tells seqsender to prep and submit to each given database. See python seqsender.py submit --help for more details.
--submission_dir is the directory where you store all of the submission histories.
--submission_name is the submission folder inside the --submission_dir directory where it contains all necessary files (such as config.yaml, metadata.csv, sequence.fasta, raw reads, etc.) in order to make a submission.
--config_file is the config file inside the --submission_name directory.
--metadata_file is the metadata file inside the --submission_name directory.
--fasta_file is the fasta file inside the --submission_name directory.
--test is used to submit to “TEST-SERVER ONLY” . For production submission, please remove this flag.

A quick look at the standard output.

Creating submission files for BIOSAMPLE
Files are stored at: /home/snu3/flu-test-submission/submission_files/BIOSAMPLE

Creating submission files for SRA
Files are stored at: /home/snu3/flu-test-submission/submission_files/SRA

Creating submission files for GENBANK
Files are stored at: /home/snu3/flu-test-submission/submission_files/GENBANK

Creating submission files for GISAID
Files are stored at: /home/snu3/flu-test-submission/submission_files/GISAID

Uploading submission files to NCBI-BIOSAMPLE
Performing a 'Test' submission
If this is not a 'Test' submission, interrupts submission immediately.

Connecting to NCBI FTP Server
Submission name: flu-test-submission
Submitting 'flu-test-submission'

Uploading submission files to NCBI-SRA
Performing a 'Test' submission
If this is not a 'Test' submission, interrupts submission immediately.

Connecting to NCBI FTP Server
Submission name: flu-test-submission
Submitting 'flu-test-submission'

Uploading submission files to GISAID-FLU
Performing a 'Test' submission with Client-Id: TEST-EA76875B00C3
If this is not a 'Test' submission, interrupts submission immediately.

Submission attempt: 1
Uploading successfully
Status report is stored at: /home/snu3/flu-test-submission/submission_report_status.csv
Log file is stored at: /home/snu3/flu-test-submission/submission_files/GISAID/gisaid_upload_log_attempt_1.txt

3. Check the status of a submission