Using reproducible environments in bioinformatics
Bioinformatics Activity Lead
Virology, Surveillance and Diagnosis Branch
Influenza Division
US Centers for Disease Control and Prevention
By the end of this session, you will:
Bioinformatics often involves:
Result:
A container is:
Think of it as:
A self-contained package that runs the same anywhere.
Containers:
Virtual machines:
Kernel — the core of an operating system; manages processes, memory, files, networking, and hardware. Containers share the host’s kernel, which is why they are lightweight. Linux containers need a Linux kernel (macOS/Windows use a hidden Linux VM).
Daemon — a background process that runs continuously, waiting for requests. Docker runs a daemon (dockerd); Podman and Apptainer are daemonless and run containers directly.
Runtime — the software that actually starts and runs a container.
Container Engine — user-facing CLI/API
High-level Runtime — daemon managing lifecycle (pull, store, run)
Low-level Runtime — spawns the actual container process
Linux VM layer (macOS/Windows only)
Tip
You run containers from images.
The OCI defines open standards for containers:
Why it matters:
Be cautious of non-OCI formats:
.sif files with no OCI source — hard to audit or rebuild.ova, .vmdk) — these are virtual machines, not containersA container registry is a storage location for container images.
Ephemeral (most common in bioinformatics)
docker run --rm — the --rm flag deletes the container when it exitsInteractive
docker run -it ubuntu bash — opens a terminal session inside the containerBackground (detached)
docker run -d nginx — starts the container and returns you to your terminalContainers are standard in modern pipelines:
Nextflow automatically pulls containers and runs each step in its own environment.
Images are tagged, for example:
cdcgov/mira-nf:v2.1.0cdcgov/irma-core:v0.9.1cdcgov/IRMA:v1.3.2cdcgov/DAIS-ribosome:v1.7.0cdcgov/mira-oxide:v1.5.4cdcgov/nextclade:v3.21.2Warning
Avoid latest — pin a specific version for reproducible work.
Why think about security?
Important
Treat containers like executable software.
Good sources:
Important
Look for clear versioning, documentation, and active maintenance.
Red flags:
latest tagsRoot (default Docker): elevated privileges; risky on shared systems.
Rootless:
HPC systems usually restrict root access — this is why Apptainer and Podman are preferred in research computing.
latest tags in workflows--volume or -v flagsRun a trusted container with Docker:
--rm/data inside the containerDo not install everything manually. Use trusted, secure containers.
Questions?