Skip to main content

Overview

This practice module combines all concepts from Module 1 into real-world tasks. You’ll draft an ML system design, containerize applications, build CI/CD pipelines, and deploy to Kubernetes.
These exercises mirror real production scenarios. Take your time, experiment, and don’t hesitate to reference the documentation pages.

Two-Part Structure

Module 1 practice is divided into two major components:
  1. H1: Initial Design Draft - Plan your ML system architecture
  2. H2: Infrastructure - Implement containerization and deployment

H1: Initial Design Draft

Before writing any code, design your ML system. This is critical for production systems where poor architecture leads to technical debt.

Reading List

Task: Write Your Design Document

Create a comprehensive design document for an ML system using the MLOps template. You can use a real system from your work or create a fictional but realistic one. Your design doc must cover:
Models in Production
  • What models are deployed?
  • How do they interact?
  • Data flow and dependencies
Pros/Cons
  • Architecture strengths
  • Known limitations
  • Trade-offs made

Reference Design Example

See this example design document for inspiration.

Acceptance Criteria

  • Approve - Document thoroughly addresses all required sections
  • No approval - Missing critical sections or insufficient detail
You’ll repeat this task at the end of the course for your final coursework. Use this practice to develop good habits and templates.

H2: Infrastructure

Now implement the infrastructure for your ML system using Docker, Kubernetes, and CI/CD.

Reading List

Task Breakdown

Complete these three pull requests to your repository:
1

PR1: Dockerfile and Registry

Create a Dockerfile for a simple ML application and push to a container registry.Requirements:
  • Write a Dockerfile with a basic web server or ML script
  • Push image to GitHub Container Registry (ghcr.io) or Docker Hub
  • Tag image appropriately (e.g., v1.0.0, latest)
Example structure:
FROM python:3.12-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY app.py .

CMD ["python", "app.py"]
Verification:
# Pull and run your image
docker pull ghcr.io/yourusername/app:latest
docker run --rm ghcr.io/yourusername/app:latest
2

PR2: GitHub Actions CI/CD

Create a GitHub Actions workflow that builds and pushes your Docker image on every PR.Requirements:
  • Workflow triggers on pull requests
  • Builds Docker image
  • Runs basic tests (if applicable)
  • Pushes to container registry
  • Displays green checkmark on success
Starter workflow:
name: CI/CD Pipeline

on:
  pull_request:
    branches: [main]
  push:
    branches: [main]

jobs:
  build-and-push:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Log in to registry
        uses: docker/login-action@v2
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
      
      - name: Build and push
        uses: docker/build-push-action@v4
        with:
          context: .
          push: true
          tags: ghcr.io/${{ github.repository }}:${{ github.sha }}
Verification:
  • Check the “Actions” tab in GitHub
  • Ensure workflow completes successfully
  • Verify image appears in GitHub Packages
3

PR3: Kubernetes Manifests

Write Kubernetes YAML definitions and test on a local cluster.Requirements: Create manifests for:
  • Pod: Single container instance
  • Deployment: Replicated application
  • Service: Network access to deployment
  • Job: One-time batch task
File structure:
k8s/
├── pod.yaml
├── deployment.yaml
├── service.yaml
└── job.yaml
Example Deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: ml-app
  template:
    metadata:
      labels:
        app: ml-app
    spec:
      containers:
      - name: app
        image: ghcr.io/yourusername/app:latest
        ports:
        - containerPort: 8080
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
Testing with kind:
# Create cluster
kind create cluster --name ml-test

# Deploy resources
kubectl apply -f k8s/

# Verify
kubectl get all
kubectl logs deployment/ml-app

# Test service
kubectl port-forward svc/ml-app 8080:8080
curl http://localhost:8080

Bonus: Install k9s

Install k9s for a better Kubernetes management experience:
brew install derailed/k9s/k9s
k9s
k9s shortcuts:
  • :pods - View pods
  • :deploy - View deployments
  • :svc - View services
  • :jobs - View jobs
  • l - View logs
  • d - Describe resource
  • ctrl-d - Delete resource
k9s dramatically improves Kubernetes debugging productivity. It’s the first tool experienced developers install.

Acceptance Criteria

Pass - All three PRs meet requirements:
  1. PR1:
    • Dockerfile builds successfully
    • Image pushed to registry
    • Image runs correctly when pulled
  2. PR2:
    • GitHub Actions workflow exists
    • Workflow runs on PRs
    • All jobs complete successfully (green checkmark)
  3. PR3:
    • All four resource types defined (Pod, Deployment, Service, Job)
    • Resources deploy to kind/minikube successfully
    • Application accessible via Service

Tips and Common Issues

Docker Troubleshooting

# Clear cache and rebuild
docker builder prune -a
docker build --no-cache -t app:latest .

# Check build context size
docker build -t app:latest . --progress=plain

Kubernetes Troubleshooting

# Check pod status
kubectl describe pod pod-name

# View events
kubectl get events --sort-by=.metadata.creationTimestamp

# Check logs
kubectl logs pod-name
kubectl logs pod-name --previous  # If pod crashed

GitHub Actions Troubleshooting

  • Check trigger conditions (branch names, paths)
  • Verify YAML syntax (use VS Code extension)
  • Look for typos in on: triggers

Example Repository Structure

your-repo/
├── .github/
│   └── workflows/
│       └── ci-cd.yaml
├── k8s/
│   ├── pod.yaml
│   ├── deployment.yaml
│   ├── service.yaml
│   └── job.yaml
├── app/
│   ├── __init__.py
│   └── main.py
├── tests/
│   └── test_app.py
├── Dockerfile
├── requirements.txt
└── README.md

Submission Checklist

Before marking this module complete, ensure:
  • Design document covers all required sections
  • Design includes ML Test Score assessment
  • Design identifies potential failure modes
  • Design connects to business metrics
  • PR1: Dockerfile committed and image in registry
  • PR2: GitHub Actions workflow runs successfully
  • PR3: All Kubernetes manifests deploy successfully
  • All three PRs merged to main branch
  • CI/CD pipeline shows green status
  • k9s tool installed (optional but recommended)

Additional Practice Ideas

Want to go deeper? Try these extensions:
  1. Multi-environment setup: Create separate namespaces for dev/staging/prod
  2. Secrets management: Use Kubernetes Secrets for API keys
  3. Monitoring: Add Prometheus/Grafana for metrics
  4. GitOps: Implement ArgoCD for declarative deployments
  5. Helm charts: Package your application as a Helm chart
  6. Integration tests: Add end-to-end tests to CI/CD
  7. Canary deployments: Implement gradual rollouts

Resources

All reading materials mentioned above, plus:

Getting Help

If you’re stuck:
  1. Check logs: Most issues reveal themselves in logs
  2. Search GitHub Issues: Others have likely hit the same problem
  3. Use k9s: Visual debugging is often faster
  4. Start simple: Get basic version working, then add complexity
  5. Ask for help: Share your error messages and what you’ve tried
Production infrastructure is complex. It’s normal to encounter issues. Each problem you solve teaches you something valuable.

Next Steps

Congratulations on completing Module 1! You now understand:
  • How to containerize ML applications
  • Kubernetes orchestration fundamentals
  • CI/CD automation with GitHub Actions
  • Serverless alternatives for simpler deployments
  • How to design production ML systems
Continue to Module 2 to learn about data management and versioning, or revisit any topics that need reinforcement.

Build docs developers (and LLMs) love