NGS QC Knowledge Check

Test your understanding of NGS quality control concepts. Select an answer for each question and click Check to see if you’re correct. Your total score will appear at the end after all questions are answered.

Answered: 0 / 15

1. Why is Quality Control important in Next-Generation Sequencing?

All of these are valid reasons. QC ensures that expensive sequencing runs produce reliable data, prevents erroneous bases from distorting phylogenetic trees, and provides the confidence needed to call novel mutations accurately.

2. What does Phred score measure?

Phred scores represent the probability of an incorrect base call. A Phred score of 30 means a 1 in 1,000 chance of error (99.9% accuracy). Higher scores indicate more confident base calls.

3. What does coverage depth measure?

Coverage depth (or read depth) measures how many sequencing reads align to (cover) a given position in the reference genome. Higher depth provides more confidence in base calls and variant detection.

4. What metadata is mandatory for public database submission?

Public databases like GISAID require collection date and geographic location as mandatory metadata. Patient-level data (age, sex) is encouraged but not required, and clade is typically assigned by the database itself.

5. What kinds of files are viewed in IGV (Integrative Genomics Viewer)?

IGV is primarily used to visualize .bam files (aligned reads). BAM files contain read alignments mapped to a reference genome, allowing you to inspect coverage, variants, and read quality at specific genomic positions.

6. True or false: if there is amplicon drop-out in the laboratory, you can use complex bioinformatics methods to infer the missing read data from your sample.

False. Missing read data should be assembled as missing (N's), NOT reference-filled. Bioinformatics methods cannot invent data that was not sequenced. Reference-filling creates false consensus sequences that can mislead downstream analyses and introduce artificial similarity to the reference.

7. What kinds of influenza samples should be sequenced?

For routine surveillance, random sampling with a Ct value below 28 ensures sufficient viral genetic material for high-quality sequencing. Higher Ct values (lower viral load) often result in incomplete genomes with poor coverage.

8. Why does MIRA check the total number of minor variants as a QC metric?

An unusually high number of minor variants across multiple segments is a strong indicator of contamination or co-infection — meaning reads from two different viral populations are mixed in the same sample. This is flagged as a QC warning.

9. Which BLAST database would be more complete for Influenza samples?

GISAID contains a more comprehensive collection of influenza sequences because many submitters share data exclusively through GISAID before (or instead of) depositing in NCBI GenBank. This makes GISAID the preferred database for influenza BLAST searches.

10. An Influenza B sample was collected from a human in France in December 2025 and was sequenced in January 2026. Its identifier is sample number A123. How should it be named?

The correct influenza strain naming convention is: Type/Location/Identifier/Collection Year. For human samples, the host is omitted from the strain name. Since this is Influenza B collected from a human, it is B/France/A123/2025. The collection year (2025) is used, not the sequencing year (2026). The host field is only included for non-human isolates (e.g., swine, avian).

11. In your lab, you receive an Influenza sample that tests positive on RT-PCR for both Influenza A H3 and Influenza A H1. What would this sample be considered?

Co-infection. The patient is infected with two different Influenza A subtypes simultaneously (H3 and H1). This is detected at the RT-PCR stage before sequencing. Do not proceed with sequencing this sample — mixed populations will produce uninterpretable assemblies.

12. In your lab, you receive an Influenza sample that tests positive on RT-PCR for Influenza A H3. Following NGS, it assembles as H3N1 with a high minor variant count in all internal segments. What would this sample be considered?

Contamination. The RT-PCR only detected H3, but the assembly shows a mismatched neuraminidase (N1 instead of the expected N2) along with high minor variants across all internal segments. This pattern indicates laboratory contamination — reads from a different sample (likely H1N1) were mixed in during library preparation or sequencing. Do not submit this genome to GISAID!

13. In your lab, you receive an Influenza sample that tests positive on RT-PCR for Influenza A H3. Following NGS, it assembles as H3N1 with a small number of minor variants. All QC metrics pass in MIRA. You BLAST each segment and all are H3N2-like except for the NA segment, which is H1N1-like. You re-sequence the sample and get the same results. What would this sample be considered?

Reassortant! Key indicators: (1) QC metrics pass with low minor variants (clean assembly from a single population), (2) only ONE segment is mismatched (NA is H1N1-like while all others are H3N2-like), and (3) re-sequencing reproduces the same result. This virus acquired its NA segment from an H1N1 lineage through reassortment — a biologically real event worth reporting.

14. What kind of pre-assembly QC does the MIRA pipeline perform for you?

MIRA performs all of these pre-assembly QC steps: it trims primer sequences, filters reads by quality score, removes reads that are too short, and can optionally generate FastQC/MultiQC reports for manual review of raw read quality.

15. What kind of post-assembly QC does the MIRA pipeline perform for you?

MIRA performs all of these post-assembly QC checks: it screens for contamination via minor variant counts, enforces minimum coverage depth thresholds, checks genome completeness, and scans coding regions for premature stop codons that may indicate assembly errors or pseudogenes.

🎉 Quiz Complete!