Skip to content

Pipeline overview

The pipeline is built using Nextflow as it's workflow manager.

Processes

FASTQC

  • FastQC gives general quality metrics about your sequenced reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. For further reading and documentation see the FastQC help pages.
  • Input: Raw and Trimmed short-read data.
  • Output: Quality metrics for raw and trimmed short-read data.

NANOPLOT

  • NanoPlot gives general quality metrics about your sequenced reads. its a Plotting tool for long read sequencing data and alignments.
  • Input: Raw and Trimmed long-read data.
  • Output: Quality metrics for long-read data.

FASTP

  • Fastp A tool designed to provide fast all-in-one preprocessing for FastQ files. This tool is developed in C++ with multithreading supported to afford high performance.
  • Input: Trimmed reads from short and long-reads.
  • Output: Adapter trimmed reads for both long and short-read data.

Qualimap/BAMQC

  • Qualimap Qualimap examines sequencing alignment data in SAM/BAM files according to the features of the mapped reads and provides an overall view of the data that helps to the detect biases in the sequencing and/or mapping of the data and eases decision-making for further analysis.
  • Input: BAM files from alignment step.
  • Output: Quality metrics and coverage statistics reports.

ALIGNMENT/MINIMAP2

  • Minimap2 Minimap2 is a versatile sequence alignment program that aligns DNA or mRNA sequences against a large reference database. Typical use cases include: (1) mapping PacBio or Oxford Nanopore genomic reads to the human genome; (2) finding overlaps between long reads with error rate up to ~15%; (3) splice-aware alignment of PacBio Iso-Seq or Nanopore cDNA or Direct RNA reads against a reference genome; (4) aligning Illumina single- or paired-end reads; (5) assembly-to-assembly alignment; (6) full-genome alignment between two closely related species with divergence below ~15%.
  • Input: Trimmed reads from FASTP step.
  • Output: Aligned reads in BAM format.

SAMTOOLS

  • Samtools Samtools is a suite of programs for interacting with high-throughput sequencing data.
  • Input: BAM file from the ALIGNMENT step.
  • Output: Statistics on each BAM file and a reference index.

PRIMERTRIMMING

  • ivarTrim iVar uses primer positions supplied in a BED file to soft clip primer sequences from an aligned and sorted BAM file. Following this, the reads are trimmed based on a quality threshold(Default: 20)
  • AmpliconClip Clips the ends of read alignments if they intersect with regions defined in a BED file. While this tool was originally written for clipping read alignment positions which correspond to amplicon primer locations it can also be used in other contexts.
  • Input: Aligned BAM files.
  • Output: BAM files with primers trimmed.

VariantCalling

  • ivarVariantCalling iVar uses the output of the samtools mpileup command to call variants - single nucleotide variants(SNVs) and indels.
  • Freyja Perform variant calling using samtools and iVar on a BAMFILE and generates relative lineage abundances from VARIANTS and DEPTHS.
  • Input: Primer trimmed BAM files.
  • Output: Variant calls and demixed sequences.

MultiQC

  • MultiQC is a visualization tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in the report data directory.
  • Results generated by MultiQC collate pipeline QC from supported tools e.g. FastQC. The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. For more information about how to use MultiQC reports, see http://multiqc.info.
  • Input: FASTQC data files
  • Output: MultQC report.

Execution Reports

  • Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.

Dependencies

  1. Install Nextflow (>=21.04.0)

  2. Install any necessary software, based on deployment strategy, visiting docs for configuration related information:


Last update: 2024-05-16