Skip to main content

Why Containerization Matters

Containerization has become the foundation of modern ML systems. By packaging your code, dependencies, and runtime environment into a single unit, containers solve the classic “works on my machine” problem and enable consistent deployments across development, staging, and production. For ML practitioners, containers provide:
  • Reproducibility: Freeze exact versions of Python, CUDA, system libraries, and your code
  • Portability: Run the same container locally, on Kubernetes, or serverless platforms
  • Isolation: Avoid dependency conflicts between different models or services
  • Scalability: Deploy multiple replicas and scale horizontally as needed

Core Concepts

Docker Basics

Docker is the most popular containerization platform. A Dockerfile defines your container image as layers:
FROM python:3.10-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["python", "train.py"]
Each instruction creates a layer that’s cached, making rebuilds faster. For ML workloads, consider:
  • Using minimal base images to reduce size
  • Installing heavy dependencies (PyTorch, TensorFlow) in early layers
  • Copying code last so changes don’t invalidate cached layers
Multi-stage builds let you compile or download models in one stage and copy only the artifacts to a smaller runtime image.

Container Registries

After building images, push them to a registry:
  • GitHub Container Registry (ghcr.io): Free for public repos
  • Docker Hub: Popular but rate-limited for anonymous pulls
  • AWS ECR / GCP Artifact Registry: Integrated with cloud platforms
Versioning with tags (e.g., app:v1.2.3 or app:latest) helps track what’s running where.

Kubernetes Fundamentals

Kubernetes (K8s) orchestrates containers at scale. Key abstractions for ML:

Pods

The smallest deployable unit—usually one container, but can include sidecars for logging or proxies

Jobs

Run containers to completion (perfect for training runs)

Deployments

Maintain a desired number of replicas (ideal for serving APIs)

Services

Provide stable networking and load balancing across pods

Example: Running a Training Job

apiVersion: batch/v1
kind: Job
metadata:
  name: bert-training
spec:
  template:
    spec:
      containers:
      - name: trainer
        image: ghcr.io/myorg/bert-trainer:v1.0
        resources:
          limits:
            nvidia.com/gpu: 1
      restartPolicy: Never
This Job requests one GPU and runs your training container once. K8s handles scheduling, retries on failure, and cleanup.

Local Development

Use kind (Kubernetes in Docker) or minikube to run a full K8s cluster locally. This lets you test manifests before deploying to production.

Cloud Platforms

Production K8s is typically managed:
  • AWS EKS: Integrates with IAM, EBS, and other AWS services
  • GCP GKE: Autopilot mode handles node management
  • Azure AKS: Good GPU support for ML workloads
For simpler needs, consider:
  • Google Cloud Run: Serverless containers (auto-scaling from zero)
  • AWS Fargate: Serverless compute for ECS/EKS
  • Railway / Modal: Developer-friendly platforms for pet projects
Serverless options can be more cost-effective for intermittent workloads, while full K8s gives you more control for high-throughput serving.

CI/CD Integration

Automate building and pushing containers:
# .github/workflows/build.yaml
name: Build Docker Image
on: [push]
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Build image
        run: docker build -t myapp:${{ github.sha }} .
      - name: Push to registry
        run: docker push myapp:${{ github.sha }}
Popular CI/CD platforms include GitHub Actions, CircleCI, Jenkins, and GitLab CI.

Hands-On Examples

Explore practical containerization in Module 1:
  • Build ML and web app containers
  • Deploy to local Kubernetes with kind
  • Use k9s to monitor resources
  • Push images to GitHub Container Registry

Next Steps

Data Management

Learn how to store and version datasets

Model Serving

Deploy containers as production APIs

Further Reading

Build docs developers (and LLMs) love