Containers vs Virtual Machines

Containers:

  • Lightweight
  • Fast startup
  • Share the host operating system kernel

Virtual machines:

  • Heavier
  • Full operating system per instance
  • Slower startup

Why Containers in Bioinformatics?

  • Reproducibility across systems
  • Easier sharing of tools and workflows
  • Fewer dependency conflicts
  • Usable on HPC, cloud, and local machines

Key Definitions

Kernel — the core of an operating system; manages processes, memory, files, networking, and hardware. Containers share the host’s kernel, which is why they are lightweight. Linux containers need a Linux kernel (macOS/Windows use a hidden Linux VM).

Daemon — a background process that runs continuously, waiting for requests. Docker runs a daemon (dockerd); Podman and Apptainer are daemonless and run containers directly.

Runtime — the software that actually starts and runs a container.

  • High-level runtime (e.g., containerd) — manages the full lifecycle: pulling images, storage, networking, then delegates to a low-level runtime
  • Low-level runtime (e.g., runc, crun, Apptainer runtime) — talks to the kernel to create the isolated container process

Core Components of a Container System

Container Engine — user-facing CLI/API

  • Docker, Podman, Apptainer

High-level Runtime — daemon managing lifecycle (pull, store, run)

  • containerd (used by Docker), Podman (daemonless)

Low-level Runtime — spawns the actual container process

  • runc, crun, Apptainer runtime

Linux VM layer (macOS/Windows only)

  • Lima, Docker Desktop VM, Podman Machine, WSL2
  • Provides the Linux kernel containers need

Images vs Containers

  • Image = blueprint, binary snapshot
  • Container = running instance of an image

Tip

You run containers from images.

The Open Container Initiative (OCI)

The OCI defines open standards for containers:

  • Image Spec — how images are built
  • Runtime Spec — how containers run
  • Distribution Spec — how registries share images

Why it matters:

  • The same image works across Docker, Podman, and Apptainer
  • No vendor lock-in
  • Apptainer can pull images from Docker Hub because they all follow OCI

Be cautious of non-OCI formats:

  • Singularity-only .sif files with no OCI source — hard to audit or rebuild
  • Tarball “containers” with no manifest or recipe
  • VM images (.ova, .vmdk) — these are virtual machines, not containers

You find container images on container registries

A container registry is a storage location for container images.

  • Share tools easily
  • Access pre-built environments
  • Version control for software stacks
  • Support reproducible workflows

Docker Hub

Quay.io

GitHub Container Registry

Amazon ECR

Typical Workflow

  1. Find a container in a registry
  2. Pull or download the image
  3. Run analysis using the container

How Containers Run

Ephemeral (most common in bioinformatics)

  • Container starts, runs a task, then is automatically removed
  • docker run --rm — the --rm flag deletes the container when it exits
  • Ideal for pipelines: each step gets a fresh, clean environment

Interactive

  • You get a shell inside the container to explore or debug
  • docker run -it ubuntu bash — opens a terminal session inside the container
  • Useful for testing tools, inspecting files, or troubleshooting

Background (detached)

  • Container runs as a long-lived service (e.g., a database or web server)
  • docker run -d nginx — starts the container and returns you to your terminal
  • Less common in bioinformatics, but used for shared services like Galaxy or JupyterHub

Workflow Integration

Containers are standard in modern pipelines:

  • Nextflow
  • Snakemake
  • CWL / WDL

Nextflow automatically pulls containers and runs each step in its own environment.

Versioning

Images are tagged, for example:

  • cdcgov/mira-nf:v2.1.0
  • cdcgov/irma-core:v0.9.1
  • cdcgov/IRMA:v1.3.2
  • cdcgov/DAIS-ribosome:v1.7.0
  • cdcgov/mira-oxide:v1.5.4
  • cdcgov/nextclade:v3.21.2

Warning

Avoid latest — pin a specific version for reproducible work.

Container Security Matters

Why think about security?

  • Containers can run arbitrary code
  • Images may include unknown or outdated software
  • Risks are higher in shared environments such as HPC or cloud systems

Important

Treat containers like executable software.

Use Trusted Sources

Good sources:

  • Official tool images
  • BioContainers
  • CDCgov Docker Hub

Important

Look for clear versioning, documentation, and active maintenance.

Red flags:

  • No documentation
  • Only latest tags
  • Unknown publishers
  • Images requiring unnecessary privileges

Root vs Rootless Containers

Root (default Docker): elevated privileges; risky on shared systems.

Rootless:

  • Podman — rootless by default; drop-in replacement for Docker commands
  • Apptainer — rootless by design; runs containers as simple user-space processes

HPC systems usually restrict root access — this is why Apptainer and Podman are preferred in research computing.

Best Practices for Users

  • Use trusted registries, such as CDCgov Docker Hub or BioContainers
  • Prefer rootless on shared systems
  • Pin versions
  • Avoid latest tags in workflows
  • Be cautious with unknown images
  • Understand where your data are mounted: --volume or -v flags

Example: Docker and Apptainer

Run a trusted container with Docker:

docker run \
--rm \                           
 --user $(id -u):$(id -g) \           
 -v ${PWD}:/data \                    
 cdcgov/mira-nf:v2.1.0 \              
 nextflow run /MIRA-NF/main.nf \      
     -profile mira_nf_container \     
     --input /data/samplesheet.csv \  
     --runpath /data \                
     --outdir /data/mira-output \     
     --e Flu-Illumina \               
     --nextclade true                 

Lines:

  1. Run a container
  2. Remove it after it exits with --rm
  3. Run as your user ID — avoids root-owned output files
  4. Mount the current directory into /data inside the container
  5. Pinned image version from CDCgov Docker Hub
  6. Execute the MIRA-NF Nextflow pipeline inside the container
  7. -12 Pipeline arguments: samplesheet, input/output paths, assay type, and Nextclade flag

Summary

  • Containers — reproducibility, portability, ease of use
  • Registries — access, sharing, versioning
  • Security — use trusted sources, pin versions, prefer rootless

Do not install everything manually. Use trusted, secure containers.

Questions

Questions?