Skip to main content

Introduction

Kubernetes (K8s) is an open-source container orchestration platform that automates deployment, scaling, and management of containerized applications. It’s essential for running ML systems at scale.
Kubernetes might be overkill for small teams or pet projects. See Serverless Alternatives for simpler deployment options.

Local Setup

Install Required Tools

1

Install kind

kind (Kubernetes in Docker) runs local K8s clusters for development:
brew install kind
2

Create Cluster

Launch a local Kubernetes cluster:
kind create cluster --name ml-in-production
3

Install kubectl

The Kubernetes command-line tool:
brew install kubectl
4

Verify Context

Check that kubectl is pointing to your cluster:
kubectl config get-contexts

Optional: k9s Dashboard

k9s provides a terminal-based UI for managing Kubernetes clusters—think of it as “htop for Kubernetes”:
# Install
brew install derailed/k9s/k9s

# Run
k9s -A
k9s is invaluable for debugging. You can view logs, exec into pods, delete resources, and monitor resource usage—all from a single interface.

Kubernetes Resources

Kubernetes uses YAML manifests to define desired state. Let’s explore the key resource types for ML workloads.

Pods

A Pod is the smallest deployable unit—it wraps one or more containers.
apiVersion: v1
kind: Pod
metadata:
  name: pod-app-web
spec:
  containers:
    - image: ghcr.io/kyryl-opens-ml/app-web:latest
      name: pod-app-web
Deploy Pods:
kubectl create -f k8s-resources/pod-app-web.yaml
Pods are ephemeral and don’t self-heal. For production workloads, use Deployments instead.

Jobs

Jobs run containers to completion—perfect for batch ML training tasks.
apiVersion: batch/v1
kind: Job
metadata:
  name: job-app-ml
spec:
  parallelism: 2
  template:
    spec:
      restartPolicy: Never
      containers:
        - image: ghcr.io/kyryl-opens-ml/app-ml:latest
          name: job-app-ml
Deploy Job:
kubectl create -f k8s-resources/job-app-ml.yaml
Key features:
  • parallelism: 2 runs 2 pods simultaneously for parallel training
  • restartPolicy: Never prevents restarts on failure
  • Automatically tracks completion status
Jobs are ideal for ML training workflows, data processing pipelines, and one-off tasks. Use CronJobs for scheduled training runs.

Deployments and Services

Deployments manage replica sets and enable rolling updates. Services provide stable networking.
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: deployments-app-web
spec:
  replicas: 2
  selector:
    matchLabels:
      app: deployments-app-web
  template:
    metadata:
      labels:
        app: deployments-app-web
    spec:
      containers:
        - name: app-web
          image: ghcr.io/kyryl-opens-ml/app-web:latest 
---
apiVersion: v1
kind: Service
metadata:
  name: deployments-app-web
  labels:
    app: deployments-app-web
spec:
  ports:
  - port: 8080
    protocol: TCP
  selector:
    app: deployments-app-web
Deploy:
kubectl create -f k8s-resources/deployment-app-web.yaml
Access the service:
kubectl port-forward svc/deployments-app-web 8080:8080
Then visit http://localhost:8080 in your browser.

Deployment Features

  • Replicas: replicas: 2 runs 2 identical pods for high availability
  • Rolling updates: Update images without downtime
  • Self-healing: Automatically restarts failed pods
  • Scaling: Easily scale up/down with kubectl scale
Services provide stable DNS names and load balancing across pod replicas. The selector matches pods by labels.

Common Operations

Viewing Resources

# List all pods
kubectl get pods

# List pods across all namespaces
kubectl get pods -A

# List deployments
kubectl get deployments

# List services
kubectl get services

# Get detailed info
kubectl describe pod pod-app-web

Logs and Debugging

# Stream logs from a pod
kubectl logs -f pod-app-ml

# Logs from specific container in pod
kubectl logs pod-name -c container-name

# Previous container logs (if crashed)
kubectl logs pod-name --previous

Scaling

# Scale deployment
kubectl scale deployment deployments-app-web --replicas=5

# Autoscale based on CPU
kubectl autoscale deployment deployments-app-web --min=2 --max=10 --cpu-percent=80

Updates

# Update image
kubectl set image deployment/deployments-app-web app-web=ghcr.io/kyryl-opens-ml/app-web:v2

# Check rollout status
kubectl rollout status deployment/deployments-app-web

# Rollback
kubectl rollout undo deployment/deployments-app-web

Cleanup

# Delete specific resource
kubectl delete pod pod-app-ml
kubectl delete deployment deployments-app-web

# Delete from file
kubectl delete -f k8s-resources/deployment-app-web.yaml

# Delete all resources in namespace
kubectl delete all --all

Resource Configuration

Resource Requests and Limits

For production ML workloads, always specify resource requirements:
spec:
  containers:
    - name: app-ml
      image: ghcr.io/kyryl-opens-ml/app-ml:latest
      resources:
        requests:
          memory: "2Gi"
          cpu: "1000m"
        limits:
          memory: "4Gi"
          cpu: "2000m"
  • Requests: Guaranteed resources for scheduling
  • Limits: Maximum resources the container can use
Without resource limits, a single runaway ML job can consume all cluster resources and crash other workloads.

GPU Support

For GPU-accelerated training:
resources:
  limits:
    nvidia.com/gpu: 1

Kubernetes for ML Patterns

Training Jobs

apiVersion: batch/v1
kind: Job
metadata:
  name: model-training
spec:
  template:
    spec:
      containers:
      - name: trainer
        image: your-registry/ml-trainer:v1
        env:
        - name: EXPERIMENT_NAME
          value: "experiment-001"
        resources:
          limits:
            nvidia.com/gpu: 1
      restartPolicy: Never
  backoffLimit: 3

Model Serving

apiVersion: apps/v1
kind: Deployment
metadata:
  name: model-server
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: server
        image: your-registry/model-server:v1
        ports:
        - containerPort: 8080
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080

Managed Kubernetes Providers

For production deployments, use managed Kubernetes services:
ProviderServiceBest For
AWSEKSAWS ecosystem integration
Google CloudGKEBest-in-class K8s experience
AWSFargate/ECSServerless containers
Google CloudCloud RunServerless K8s
Managed services handle control plane maintenance, upgrades, and scaling, letting you focus on applications rather than infrastructure.

Best Practices

Namespaces

Organize resources with namespaces:
# Create namespace
kubectl create namespace ml-training

# Deploy to namespace
kubectl create -f job.yaml -n ml-training

# Set default namespace
kubectl config set-context --current --namespace=ml-training

Labels and Selectors

Use labels for organization and selection:
metadata:
  labels:
    app: model-trainer
    version: v2
    environment: production
    team: ml-platform

ConfigMaps and Secrets

Externalize configuration:
# Create ConfigMap
kubectl create configmap model-config --from-file=config.yaml

# Create Secret
kubectl create secret generic api-keys --from-literal=token=abc123
Mount in pods:
volumes:
- name: config
  configMap:
    name: model-config
- name: secrets
  secret:
    secretName: api-keys

Resources

Learning Materials

Advanced Topics

Next Steps

Learn how to automate building and deploying these resources with CI/CD pipelines.

Build docs developers (and LLMs) love