vignettes/running-mira.Rmd
running-mira.Rmd
This documentation assumes you have the four required Docker containers (IRMA, DAIS-Ribosome, spyne, and MIRA) already running. If not, please start with getting started and MIRA install instructions.
DO NOT PUT SPACES OR SLASHES IN YOU RUN FOLDER NAMES!
This screenshot shows “Ubuntu-18.04” but many users just
see “Ubuntu”. This can vary, particularly if you happen to have multiple
Ubuntu versions installed. If you have multiple Ubunut distributions
installed, move the fastqs to whichever “Ubuntu” contains
FLU_SC2_SEQUENCING within
home>USERNAME>FLU_SC2_SEQUENCING
Containers
tab
on the left sidebar. Assure that the following containers are running or
run each of them:
Run MIRA by clicking the blue link “8020:8050”. This will open MIRA into your default internet browser. You can “bookmark” this site in your browser.
Click the REFRESH RUN LISTING
button and select your
run from the dropdown box.
Next, click DOWNLOAD SAMPLESHEET TEMPLATE
and open
the resulting Excel spreadsheet
Your Barcode numbers
column will be inferred from
the fastq_pass folder inside of your run folder.
Enter your sample IDs in the “Sample ID” column
Next, select your Sample Type
. Most of your sample’s
are test samples, so each row defaults to test
, but select
- control
or + control
for your
negative and positive controls
respectively.
Save your samplesheet with changes in Excel
Back in MIRA, Click the “Drag and Drop your Samplesheet” to select your filled-in samplesheet.
Now select UNLOCK ASSEMBLY BUTTON
and click
START GENOME ASSEMBLY
. Now is a good time to go have a
coffee.
Watch IRMA Progress
.
Assembly will take some time, how long exactly is dependent on your
sample number, the number of raw sequencing reads, and the power of your
computer.
When IRMA has finished, it will say “IRMA is finished” and you can now click on “DISPLAY IRMA RESULTS”.
Review the distribution of reads assigned to each barcode. The ideal result would be a similar number of reads assigned to each test and positive control. However, it is ok to not have similar read numbers per sample. Samples with a low proportion of reads may indicate higher Ct of starting material or less performant PCR during library preparation. What is most important for sequencing assembly is raw count of reads and their quality.
Review the “Automatic Quality Control Decisions” heatmap. In addition to IRMA’s built in quality control, MIRA requires a minimum median coverage of 50x, a minimum coverage of the reference length of 90%, and less than 10 minor variants >=5%. These are marked in yellow to orange according to the number of these failure types. Samples that failed to generate any assembly are marked in red. In addition, premature stop codons are flagged in yellow. CDC does not submit sequences with premature stop codons, particularly in HA, NA or SARS-CoV-2 Spike. Outside of those genes, premature stop codons near the end of the gene may be ok for submission. Hover your mouse over the figure to see individual results.
– | – |
Review Irma Summary
. This summarizes the above
information, as well as detailed read assignment to specific referencs,
in a table. Columns can be sorted and filtered, ie.
>1000
. Click Export
to save this as an
Excel file.
Review genome coverage depth. The heatmap summarizes the mean
coverage per sample per gene. All the images in MIRA can be interacted
with. Click the gray buttons on the top right of the images. The camera
icon will save the image. Clicking on a sample in the heatmap, or
selecting one in the drop down menu will display two plots. On the left
is a “sankey plot” that shows the number of reads assigned to the
barcode, how many of those passed IRMA’s QC, and how many are assigned
to each gene reference. The plot on the right shows the sample’s
complete coverage per gene. Try toggling the
log y --- linear y
button. In log space, you likely need to
reset the view by clicking the gray button in the top right that looks
like an X inside of a box. You can also try panning and zooming the
plot.
Inspect amino acid variants against popular reference sequences.
Here, you may see some mutations that do not match Amino Acid letters.
Translation produces standard amino acid codes with the two non-standard exceptions (. and ~) listed below. The translation engine stops when it encounters a stop codon, but corrects itself to continue in-frame when a frameshift is encountered.
Character | Interpretation |
---|---|
. | Missing alignment data |
- | Gap in alignment (ex: deletion) |
~ | Partial codon (ex: frameshift) |
X | Ambiguous codon translation |
If you have frameshifts in your data, click here for instructions on how to fix them
Inspect minor variation at single nucleotides, insertions and deletions relative to the sample’s generated consensus sequence. These tables are for your own usage and not necessary to review in detail. They can be sorted and filtered and exported to excel sheets for further analyses.
Export IRMA’s amended consensus nucleotide sequences and amino acid fasta files. This includes only those passing MIRA’s QC criteria and are ready for submission to public databases!
The DOWNLOAD FASTAS
button will only return passing
samples. To investigate failed sequences, click the
DOWNLOAD FAILED FASTAS
button. Do not submit failed samples
to public repositories!
You can open and view these Amino Acid and Nucleotide fastas in any sequence viewing application