Content developed by Ben Rambo-Martin and Kristine Lacek
Slides
Page /
Objectives
- Describe quality management system (QMS) components
- Reproducibility
- Version control
- Documentation
- Apply QMS principles to NGS workflows
- Pre-analytical
- Analytical
Reproducibility
- Same input + same methods = same results
- Core principle of computational (all) science
- Record software versions
- Aligners, samtools, variant callers, workflow managers
- Track changes with version control
- Git for scripts and pipeline development
- Document parameters and references
- Primer schemes
- Quality thresholds
- Identify test data to run through new versions and pipelines
- Use workflow managers
- Snakemake, Nextflow for structured, repeatable runs
- Leverage containers
- Docker or Singularity ensure consistent environments
Documentation
- Explains how tools and pipelines work
- Inputs, outputs, parameters, and assumptions
- Enables reproducibility
- Others (and future you) can rerun the same analysis
- Reduces errors and misuse
- Clear defaults and examples prevent incorrect runs
- Essential for collaboration
- Shared understanding across labs, teams, and institutions
Best Practices in Writing Documentation
- README.md files — Overview, requirements, and basic usage
- Usage examples & command snippets — Copy-pasteable starting points
- Parameter descriptions — What flags do, expected values, and defaults
- Workflow diagrams — Visualize steps, inputs, and outputs
- Document data formats — FASTQ, FASTA, BAM, reference versions
- Record software versions — Tools, containers, workflow managers
- Include example datasets — Small test cases to validate setup
Best Practices in Using Documentation
- Start with the README
- Understand the purpose, inputs, outputs, and limitations
- Check versions and dates
- Ensure the documentation matches the tool or pipeline version you’re using
- Follow the example first
- Run the provided test or example before real data
- Read parameter descriptions carefully
- Defaults may not match your experiment
- Verify input formats
- FASTQ, FASTA, BAM, sample sheets, reference builds
- Look for expected outputs
- Confirm files and directories match the documented results
Git
Git is a version control system that tracks changes to files over time.
- Designed for collaboration
- Multiple people can work on the same codebase
- Think of it like a shared document (Microsoft Office, Google Docs)
- Keeps a history of your work
- See what changed, when, why, and by whom
- Common in bioinformatics
- Pipelines, scripts, documentation, and workflows
- GitLab and GitHub are the most used platforms
Core Git Concepts
| Concept | Description |
|---|---|
| Repository (repo) | A project tracked by Git |
| Commit | A saved snapshot of changes with a message |
| Branch | A parallel version of the code for development or testing |
| Remote repository | Shared copy on platforms like GitHub or GitLab |
Common Git Commands
Create or get a repository:
git init— start a repogit clone <repo>— copy an existing repo
Track and save changes:
git status— see changesgit add <file>— stage changesgit commit -m "message"— save snapshot
Sync with others:
git pull— get updatesgit push— share your changes