Practice Exercise

Overview

This practice module combines all concepts from Module 1 into real-world tasks. You’ll draft an ML system design, containerize applications, build CI/CD pipelines, and deploy to Kubernetes.

These exercises mirror real production scenarios. Take your time, experiment, and don’t hesitate to reference the documentation pages.

Two-Part Structure

Module 1 practice is divided into two major components:

H1: Initial Design Draft - Plan your ML system architecture
H2: Infrastructure - Implement containerization and deployment

H1: Initial Design Draft

Before writing any code, design your ML system. This is critical for production systems where poor architecture leads to technical debt.

Reading List

Core Reading

Ml-design-docs - Templates and examples
How to Write Design Docs for Machine Learning Systems - Comprehensive guide
Design Docs at Google - Industry practices

Project Management

Best Practices

Technical Debt & Testing

The ML Test Score: A Rubric for ML Production Readiness
datascience-fails - Common pitfalls

Practical Resources

CS 329S Lecture 1. Machine Learning Systems in Production
You Don’t Need a Bigger Boat - Right-sized ML infrastructure
Why to Hire Machine Learning Engineers, Not Data Scientists

Task: Write Your Design Document

Create a comprehensive design document for an ML system using the MLOps template. You can use a real system from your work or create a fictional but realistic one. Your design doc must cover:

Architecture
Operations
Planning
Assessment

Models in Production

What models are deployed?
How do they interact?
Data flow and dependencies

Pros/Cons

Architecture strengths
Known limitations
Trade-offs made

Reference Design Example

See this example design document for inspiration.

Acceptance Criteria

✅ Approve - Document thoroughly addresses all required sections
❌ No approval - Missing critical sections or insufficient detail

You’ll repeat this task at the end of the course for your final coursework. Use this practice to develop good habits and templates.

H2: Infrastructure

Now implement the infrastructure for your ML system using Docker, Kubernetes, and CI/CD.

Reading List

Docker Fundamentals

0 to production-ready: Docker packaging best practices - Video
Docker and Python: Data Science and ML - Video
Docker introduction - Tutorial
Overview of Docker Hub - Registry guide

CI/CD

Introduction to GitHub Actions
Course: CI/CD for Machine Learning (GitOps) - Free W&B course

Kubernetes

Learn Kubernetes Basics - Official tutorial
Hello Minikube - Quick start
Kind Quick Start - Local clusters
Book: Kubernetes in Action - Comprehensive guide

Advanced Topics

Why data scientists shouldn’t need to know Kubernetes
Scaling Kubernetes to 7,500 nodes - OpenAI case study

Task Breakdown

Complete these three pull requests to your repository:

PR1: Dockerfile and Registry

Create a Dockerfile for a simple ML application and push to a container registry.Requirements:

Write a Dockerfile with a basic web server or ML script
Push image to GitHub Container Registry (ghcr.io) or Docker Hub
Tag image appropriately (e.g., v1.0.0, latest)

Example structure:

FROM python:3.12-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY app.py .

CMD ["python", "app.py"]

Verification:

# Pull and run your image
docker pull ghcr.io/yourusername/app:latest
docker run --rm ghcr.io/yourusername/app:latest

PR2: GitHub Actions CI/CD

Create a GitHub Actions workflow that builds and pushes your Docker image on every PR.Requirements:

Workflow triggers on pull requests
Builds Docker image
Runs basic tests (if applicable)
Pushes to container registry
Displays green checkmark on success

Starter workflow:

name: CI/CD Pipeline

on:
  pull_request:
    branches: [main]
  push:
    branches: [main]

jobs:
  build-and-push:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Log in to registry
        uses: docker/login-action@v2
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
      
      - name: Build and push
        uses: docker/build-push-action@v4
        with:
          context: .
          push: true
          tags: ghcr.io/${{ github.repository }}:${{ github.sha }}

Verification:

Check the “Actions” tab in GitHub
Ensure workflow completes successfully
Verify image appears in GitHub Packages

PR3: Kubernetes Manifests

Write Kubernetes YAML definitions and test on a local cluster.Requirements: Create manifests for:

Pod: Single container instance
Deployment: Replicated application
Service: Network access to deployment
Job: One-time batch task

File structure:

k8s/
├── pod.yaml
├── deployment.yaml
├── service.yaml
└── job.yaml

Example Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: ml-app
  template:
    metadata:
      labels:
        app: ml-app
    spec:
      containers:
      - name: app
        image: ghcr.io/yourusername/app:latest
        ports:
        - containerPort: 8080
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"

Testing with kind:

# Create cluster
kind create cluster --name ml-test

# Deploy resources
kubectl apply -f k8s/

# Verify
kubectl get all
kubectl logs deployment/ml-app

# Test service
kubectl port-forward svc/ml-app 8080:8080
curl http://localhost:8080

Bonus: Install k9s

Install k9s for a better Kubernetes management experience:

brew install derailed/k9s/k9s
k9s

k9s shortcuts:

:pods - View pods
:deploy - View deployments
:svc - View services
:jobs - View jobs
l - View logs
d - Describe resource
ctrl-d - Delete resource

k9s dramatically improves Kubernetes debugging productivity. It’s the first tool experienced developers install.

Acceptance Criteria

✅ Pass - All three PRs meet requirements:

PR1:
- Dockerfile builds successfully
- Image pushed to registry
- Image runs correctly when pulled
PR2:
- GitHub Actions workflow exists
- Workflow runs on PRs
- All jobs complete successfully (green checkmark)
PR3:
- All four resource types defined (Pod, Deployment, Service, Job)
- Resources deploy to kind/minikube successfully
- Application accessible via Service

Tips and Common Issues

Docker Troubleshooting

Build Failures
Registry Auth
Image Size

# Clear cache and rebuild
docker builder prune -a
docker build --no-cache -t app:latest .

# Check build context size
docker build -t app:latest . --progress=plain

# GitHub Container Registry
echo $GITHUB_TOKEN | docker login ghcr.io -u USERNAME --password-stdin

# Verify login
docker pull ghcr.io/yourusername/test:latest

# Use slim base images
FROM python:3.12-slim

# Multi-stage builds
FROM python:3.12 as builder
RUN pip install --user package

FROM python:3.12-slim
COPY --from=builder /root/.local /root/.local

Kubernetes Troubleshooting

Pod Won't Start
Image Pull Errors
Service Not Accessible

# Check pod status
kubectl describe pod pod-name

# View events
kubectl get events --sort-by=.metadata.creationTimestamp

# Check logs
kubectl logs pod-name
kubectl logs pod-name --previous  # If pod crashed

# Make image public in GitHub
# Or create image pull secret
kubectl create secret docker-registry ghcr-secret \
  --docker-server=ghcr.io \
  --docker-username=USERNAME \
  --docker-password=$GITHUB_TOKEN

# Reference in pod spec
spec:
  imagePullSecrets:
  - name: ghcr-secret

# Check service endpoints
kubectl get endpoints service-name

# Verify labels match
kubectl get pods --show-labels

# Test from within cluster
kubectl run -it --rm debug --image=alpine --restart=Never -- sh
wget -O- http://service-name:8080

GitHub Actions Troubleshooting

Workflow Won't Run
Permission Denied
Secrets Not Available

Check trigger conditions (branch names, paths)
Verify YAML syntax (use VS Code extension)
Look for typos in on: triggers

# Add required permissions
permissions:
  contents: read
  packages: write
  id-token: write  # For OIDC auth

GITHUB_TOKEN is automatic (no setup needed)
Custom secrets: Settings → Secrets → Actions
Organization secrets may need approval

Example Repository Structure

your-repo/
├── .github/
│   └── workflows/
│       └── ci-cd.yaml
├── k8s/
│   ├── pod.yaml
│   ├── deployment.yaml
│   ├── service.yaml
│   └── job.yaml
├── app/
│   ├── __init__.py
│   └── main.py
├── tests/
│   └── test_app.py
├── Dockerfile
├── requirements.txt
└── README.md

Submission Checklist

Before marking this module complete, ensure:

Additional Practice Ideas

Want to go deeper? Try these extensions:

Multi-environment setup: Create separate namespaces for dev/staging/prod
Secrets management: Use Kubernetes Secrets for API keys
Monitoring: Add Prometheus/Grafana for metrics
GitOps: Implement ArgoCD for declarative deployments
Helm charts: Package your application as a Helm chart
Integration tests: Add end-to-end tests to CI/CD
Canary deployments: Implement gradual rollouts

Resources

All reading materials mentioned above, plus:

Module 1 Overview - Refresh core concepts
Docker Documentation - Container deep dive
Kubernetes Documentation - Orchestration details
CI/CD Documentation - Pipeline patterns
Serverless Alternatives - Optional simpler approaches

Getting Help

If you’re stuck:

Check logs: Most issues reveal themselves in logs
Search GitHub Issues: Others have likely hit the same problem
Use k9s: Visual debugging is often faster
Start simple: Get basic version working, then add complexity
Ask for help: Share your error messages and what you’ve tried

Production infrastructure is complex. It’s normal to encounter issues. Each problem you solve teaches you something valuable.

Next Steps

Congratulations on completing Module 1! You now understand:

How to containerize ML applications
Kubernetes orchestration fundamentals
CI/CD automation with GitHub Actions
Serverless alternatives for simpler deployments
How to design production ML systems

Continue to Module 2 to learn about data management and versioning, or revisit any topics that need reinforcement.

Serverless Alternatives

Module 2: Data Management

⌘I

Module 1: Infrastructure

Module 2: Data Management

Module 3: Training Workflows

Module 4: Pipeline Orchestration

Module 5: Model Serving

Module 6: Optimization

Module 7: Monitoring

Module 8: Cloud Platforms

Overview

Two-Part Structure

H1: Initial Design Draft

Reading List

Task: Write Your Design Document

Reference Design Example

Acceptance Criteria

H2: Infrastructure

Reading List

Task Breakdown

Bonus: Install k9s

Acceptance Criteria

Tips and Common Issues

Docker Troubleshooting

Kubernetes Troubleshooting

GitHub Actions Troubleshooting

Example Repository Structure

Submission Checklist

Additional Practice Ideas

Resources

Getting Help

Next Steps

Build docs developers (and LLMs) love

Module 1: Infrastructure

Module 2: Data Management

Module 3: Training Workflows

Module 4: Pipeline Orchestration

Module 5: Model Serving

Module 6: Optimization

Module 7: Monitoring

Module 8: Cloud Platforms

​Overview

​Two-Part Structure

​H1: Initial Design Draft

​Reading List

​Task: Write Your Design Document

​Reference Design Example

​Acceptance Criteria

​H2: Infrastructure

​Reading List

​Task Breakdown

​Bonus: Install k9s

​Acceptance Criteria

​Tips and Common Issues

​Docker Troubleshooting

​Kubernetes Troubleshooting

​GitHub Actions Troubleshooting

​Example Repository Structure

​Submission Checklist

​Additional Practice Ideas

​Resources

​Getting Help

​Next Steps

Build docs developers (and LLMs) love

Overview

Two-Part Structure

H1: Initial Design Draft

Reading List

Task: Write Your Design Document

Reference Design Example

Acceptance Criteria

H2: Infrastructure

Reading List

Task Breakdown

Bonus: Install k9s

Acceptance Criteria

Tips and Common Issues

Docker Troubleshooting

Kubernetes Troubleshooting

GitHub Actions Troubleshooting

Example Repository Structure

Submission Checklist

Additional Practice Ideas

Resources

Getting Help

Next Steps