Content developed by Kristine Lacek

Global Sequence Alignment

  • Aligns sequences end to end
  • Forces alignment across the full length of both sequences
  • Best for similar-length, closely related sequences
    • Whole genes or full genomes
  • Penalizes gaps and mismatches across entire sequence
    • Missing regions reduce overall score
  • Classic algorithm: Needleman–Wunsch

MULTIPLE SEQUENCE ALIGNMENT

  • Multifasta -> multifasta

Local Sequence Alignment

  • Aligns the best matching regions only
  • Finds high-similarity subsequences
  • Best for sequences of different lengths
    • Reads vs reference, conserved domains
  • Ignores poorly matching regions
    • Unaligned ends are not penalized
  • Classic algorithm: Smith–Waterman

GENOME ASSEMBLY

  • Fastq -> fasta

Pairwise Sequence Alignment

  • Compares two sequences at a time
  • Identifies similarity and differences
  • Determines optimal alignment
  • Accounts for matches, mismatches, and gaps
  • Can be global or local:
    • Global → full-length comparison
    • Local → best matching region only
  • Used for:
    • Comparing a read to a reference
    • Gene-to-gene comparisons
    • Similarity scoring

Multiple Sequence Alignment

  • Aligns three or more sequences simultaneously
  • Identifies conserved and variable regions
  • Used to study evolutionary relationships
    • Compare strains, species, or gene families
  • Highlights mutations and conserved motifs
    • Detect SNPs, insertions, deletions
  • Common tools:
    • MAFFT
    • MUSCLE
    • Clustal Omega
  • Practical uses:
    • Compare viral genomes across samples
    • Build phylogenetic trees
    • Identify conserved primer or target regions

References

  • What is a reference?
    • A reference sequence serves as a standardized baseline for sequence comparisons.

Coordinate Space

  • Coordinate space is very important for discussing alignments. It defines the positioning and boundaries of a sequence, allowing researchers to pinpoint exactly where mutations, genes, or other features are located relative to a reference.