After the workshop, you are expected to apply the concepts and tools covered to your laboratory’s real influenza sequencing data. This practical will walk you through repeating the entire workflow on a set of publicly available samples. You should complete the exercises in this practical using your institution’s computational environment and resources. The goal is to gain confidence in executing the workflow end-to-end and to identify any gaps or challenges specific to your setup.
Post-workshop deliverables include completing the end-to-end workflow on your institution’s influenza sequencing data and reporting the results back to CDC and APHL. We expect two reports during your country’s upcoming influenza season, one mid-season and one post-season.
Each institution has a dedicated project tracker with milestones and issues that map directly to this practical. Find your institution’s repository below:
| Institution | Repository |
|---|---|
| Instituto De Salud Pública De Chile (ISP) | post-bfxwkshp-isp-chile |
| Laboratorio Nacional De Salud De Guatemala (LNS) | post-bfxwkshp-lns-guatemala |
| Instituto de Diagnóstico y Referencia Epidemiológicos (InDRE) | post-bfxwkshp-indre-mexico |
| Ministerio De Salud Paraguay / Laboratorio Central De Salud Pública (LCSP) | post-bfxwkshp-lcsp-paraguay |
| Instituto Nacional De Salud Peru (INS) | post-bfxwkshp-ins-peru |
| Instituto Conmemorativo Gorgas de Estudios de la Salud (Gorgas) | post-bfxwkshp-gorgas-panama |
| Secretaría Nacional de Ciencia y Tecnología (SENACYT) | post-bfxwkshp-senacyt-panama |
Step 1 — Download Test Data
Public sequencing data are available through NCBI’s Sequence Read Archive (SRA). Your goal is to download paired-end FASTQ files for influenza samples under BioProject PRJNA1437047 that match the pattern H1_*.
Suggested tools: sra-toolkit (prefetch, fasterq-dump), Entrez Direct (esearch, efetch), SRA Run Selector
Step 2 — File Renaming and Directory Setup
Mira expects input files to follow a specific naming convention tied to a samplesheet. The SRA-downloaded files are named by run accession (e.g., SRR12345678_1.fastq), but Mira expects the more standard Illumina naming convention (_R1/_R2). You need to rename the files and prepare the directory structure Mira expects.
You also need to create the expected directory structure for Mira and the correctly formatted samplesheet.csv.
Suggested tools: bash loops (while, for), mv, I/O redirection (>, >>), cut
Step 3 — Genome Assembly with Mira
Mira-nf is the CDC’s influenza genome assembly and QC pipeline built on Nextflow. It can be run locally or on a cluster, and it uses containerized environments for reproducibility. For HPC environments, you may need to configure Nextflow profiles to specify resource requirements and execution parameters. CDC is here to help you configure it for your specific setup.
Step 4 — Mira QC: Pass vs. Fail
Mira generates quality metrics for each assembled segment and determines whether the assembly passes or fails based on predefined thresholds. You need to review the QC summary output, understand the metrics reported and document them. You should be confident in explaining to your supervisors and colleagues why Mira’s quality thresholds are set where they are, and how to interpret the results. You should also be able to identify any patterns in the QC failures that may indicate issues with specific segments or samples.
-
Locate and examine the QC summary output from Mira. What columns/metrics are reported? What does each metric tell you about assembly quality?
-
What thresholds does Mira use to define a “passing” assembly? Consider genome completeness, coverage depth, and ambiguous base counts.
-
How many total reads did your negative control have and what percent of those reads matched influenza?
-
How many samples passed? How many failed? Are there any patterns in the failures (e.g., specific segments, low input material)?
Step 5 — Extracting HA Sequences from Mira Output
The HA (hemagglutinin) segment is most commonly used for phylogenetic analysis. You need to extract the HA consensus sequences from samples that passed QC. HA is segment 4 for influenza A.
You should have a single FASTA file containing one HA consensus sequence per QC-passing sample.
Suggested bash tools: grep,>>
Step 6 — Phylogenetic Analysis with Nextstrain
Nextstrain provides tools for phylogenetic analysis and visualization of pathogen genomic data. The core command-line toolkit is Augur, and interactive trees are viewed in Auspice.
Suggested tools: augur (align, tree, refine, ancestral, translate, export), auspice
-
Create a metadata TSV for your samples with strain names, collection dates, and geographic info.
-
Download a global reference dataset of H1 HA sequences from GISAID or GenBank to provide context for your samples.
-
Align your HA sequences against the reference using
augur align. -
Build a phylogenetic tree with
augur tree, then refine it with temporal information usingaugur refine --timetree. -
(Optional) Reconstruct ancestral sequences and translate mutations onto the tree with
augur ancestralandaugur translate. -
Export the tree for visualization with
augur exportand view it in Auspice. -
Explore the interactive tree. What relationships, clusters, or outliers do you see among your samples?
Step 7 — Reporting
Suggested tools: Markdown, pandoc, quarto
-
Modify the provided report template to include your institution’s name, logo, and any specific sections relevant to your reporting needs.
-
Populate the report with your analysis results, including a description of the samples analyzed and the phylogenetic tree visualizations.
Milestone Reference Summary
The exercises in this practical map to the following milestones and issues in your institution’s tracker repository (see the Institution Tracker Repositories table above):
Milestone 1: Governance & Operations
| Issue | Title | Practical Section |
|---|---|---|
| #1 | Define turnaround time targets for analysis and reporting | Parts 3–7 (overall workflow timing) |
| #2 | Define sample prioritization criteria | Part 4 (QC pass/fail triage) |
| #3 | Define contingency plans for pipeline/infrastructure failure | Part 3 (pipeline execution) |
| #4 | Implement continuous monitoring metrics | Part 4 (QC metrics review) |
| #5 | Define metadata standards and change management for pipelines | Part 2 (file naming/metadata) |
Milestone 2: Setup & Validation
| Issue | Title | Practical Section |
|---|---|---|
| #6 | Define computational infrastructure | Parts 1, 3 (tool setup) |
| #7 | Implement reproducible environments | Part 3 (Nextflow profiles, containers) |
| #8 | Establish version control | Part 2 (directory structure, traceability) |
| #9 | Configure computational environment | Part 3 (Mira setup) |
| #10 | Validate analysis pipeline | Parts 3–4 (assembly + QC validation) |
Milestone 3: Operational Readiness
| Issue | Title | Practical Section |
|---|---|---|
| #11 | Identify sequence data sources | Part 1 (SRA data acquisition) |
| #12 | Define acceptance/rejection criteria | Part 4 (QC pass/fail thresholds) |
| #13 | Establish target sample throughput and define reporting period | Parts 3–4 (batch processing) |
| #14 | Develop SOPs | Part 5 (HA extraction procedure) |
| #15 | Implement automated workflows | Part 6 (Nextstrain pipeline) |
| #16 | Develop submission SOPs | Part 7 (reporting deliverables) |
| #17 | Define report templates | Part 7 (Quarto report template) |