Introduction
Kubernetes (K8s) is an open-source container orchestration platform that automates deployment, scaling, and management of containerized applications. It’s essential for running ML systems at scale.
Kubernetes might be overkill for small teams or pet projects. See Serverless Alternatives for simpler deployment options.
Local Setup
Install kind
kind (Kubernetes in Docker) runs local K8s clusters for development:
Create Cluster
Launch a local Kubernetes cluster: kind create cluster --name ml-in-production
Install kubectl
The Kubernetes command-line tool:
Verify Context
Check that kubectl is pointing to your cluster: kubectl config get-contexts
Optional: k9s Dashboard
k9s provides a terminal-based UI for managing Kubernetes clusters—think of it as “htop for Kubernetes”:
# Install
brew install derailed/k9s/k9s
# Run
k9s -A
k9s is invaluable for debugging. You can view logs, exec into pods, delete resources, and monitor resource usage—all from a single interface.
Kubernetes Resources
Kubernetes uses YAML manifests to define desired state. Let’s explore the key resource types for ML workloads.
Pods
A Pod is the smallest deployable unit—it wraps one or more containers.
pod-app-web.yaml
pod-app-ml.yaml
apiVersion : v1
kind : Pod
metadata :
name : pod-app-web
spec :
containers :
- image : ghcr.io/kyryl-opens-ml/app-web:latest
name : pod-app-web
Deploy Pods:
kubectl create -f k8s-resources/pod-app-web.yaml
kubectl create -f k8s-resources/pod-app-ml.yaml
Pods are ephemeral and don’t self-heal. For production workloads, use Deployments instead.
Jobs
Jobs run containers to completion—perfect for batch ML training tasks.
apiVersion : batch/v1
kind : Job
metadata :
name : job-app-ml
spec :
parallelism : 2
template :
spec :
restartPolicy : Never
containers :
- image : ghcr.io/kyryl-opens-ml/app-ml:latest
name : job-app-ml
Deploy Job:
kubectl create -f k8s-resources/job-app-ml.yaml
Key features:
parallelism: 2 runs 2 pods simultaneously for parallel training
restartPolicy: Never prevents restarts on failure
Automatically tracks completion status
Jobs are ideal for ML training workflows, data processing pipelines, and one-off tasks. Use CronJobs for scheduled training runs.
Deployments and Services
Deployments manage replica sets and enable rolling updates. Services provide stable networking.
---
apiVersion : apps/v1
kind : Deployment
metadata :
name : deployments-app-web
spec :
replicas : 2
selector :
matchLabels :
app : deployments-app-web
template :
metadata :
labels :
app : deployments-app-web
spec :
containers :
- name : app-web
image : ghcr.io/kyryl-opens-ml/app-web:latest
---
apiVersion : v1
kind : Service
metadata :
name : deployments-app-web
labels :
app : deployments-app-web
spec :
ports :
- port : 8080
protocol : TCP
selector :
app : deployments-app-web
Deploy:
kubectl create -f k8s-resources/deployment-app-web.yaml
Access the service:
kubectl port-forward svc/deployments-app-web 8080:8080
Then visit http://localhost:8080 in your browser.
Deployment Features
Replicas : replicas: 2 runs 2 identical pods for high availability
Rolling updates : Update images without downtime
Self-healing : Automatically restarts failed pods
Scaling : Easily scale up/down with kubectl scale
Services provide stable DNS names and load balancing across pod replicas. The selector matches pods by labels.
Common Operations
Viewing Resources
# List all pods
kubectl get pods
# List pods across all namespaces
kubectl get pods -A
# List deployments
kubectl get deployments
# List services
kubectl get services
# Get detailed info
kubectl describe pod pod-app-web
Logs and Debugging
View Logs
Execute Commands
Port Forwarding
# Stream logs from a pod
kubectl logs -f pod-app-ml
# Logs from specific container in pod
kubectl logs pod-name -c container-name
# Previous container logs (if crashed)
kubectl logs pod-name --previous
# Run shell in pod
kubectl exec -it pod-app-web -- /bin/bash
# Run single command
kubectl exec pod-app-web -- ls /app
# Forward local port to pod
kubectl port-forward pod/pod-app-web 8080:8080
# Forward to service
kubectl port-forward svc/deployments-app-web 8080:8080
Scaling
# Scale deployment
kubectl scale deployment deployments-app-web --replicas=5
# Autoscale based on CPU
kubectl autoscale deployment deployments-app-web --min=2 --max=10 --cpu-percent=80
Updates
# Update image
kubectl set image deployment/deployments-app-web app-web=ghcr.io/kyryl-opens-ml/app-web:v2
# Check rollout status
kubectl rollout status deployment/deployments-app-web
# Rollback
kubectl rollout undo deployment/deployments-app-web
Cleanup
# Delete specific resource
kubectl delete pod pod-app-ml
kubectl delete deployment deployments-app-web
# Delete from file
kubectl delete -f k8s-resources/deployment-app-web.yaml
# Delete all resources in namespace
kubectl delete all --all
Resource Configuration
Resource Requests and Limits
For production ML workloads, always specify resource requirements:
spec :
containers :
- name : app-ml
image : ghcr.io/kyryl-opens-ml/app-ml:latest
resources :
requests :
memory : "2Gi"
cpu : "1000m"
limits :
memory : "4Gi"
cpu : "2000m"
Requests : Guaranteed resources for scheduling
Limits : Maximum resources the container can use
Without resource limits, a single runaway ML job can consume all cluster resources and crash other workloads.
GPU Support
For GPU-accelerated training:
resources :
limits :
nvidia.com/gpu : 1
Kubernetes for ML Patterns
Training Jobs
apiVersion : batch/v1
kind : Job
metadata :
name : model-training
spec :
template :
spec :
containers :
- name : trainer
image : your-registry/ml-trainer:v1
env :
- name : EXPERIMENT_NAME
value : "experiment-001"
resources :
limits :
nvidia.com/gpu : 1
restartPolicy : Never
backoffLimit : 3
Model Serving
apiVersion : apps/v1
kind : Deployment
metadata :
name : model-server
spec :
replicas : 3
template :
spec :
containers :
- name : server
image : your-registry/model-server:v1
ports :
- containerPort : 8080
livenessProbe :
httpGet :
path : /health
port : 8080
readinessProbe :
httpGet :
path : /ready
port : 8080
Managed Kubernetes Providers
For production deployments, use managed Kubernetes services:
Provider Service Best For AWS EKS AWS ecosystem integration Google Cloud GKE Best-in-class K8s experience AWS Fargate/ECS Serverless containers Google Cloud Cloud Run Serverless K8s
Managed services handle control plane maintenance, upgrades, and scaling, letting you focus on applications rather than infrastructure.
Best Practices
Namespaces
Organize resources with namespaces:
# Create namespace
kubectl create namespace ml-training
# Deploy to namespace
kubectl create -f job.yaml -n ml-training
# Set default namespace
kubectl config set-context --current --namespace=ml-training
Labels and Selectors
Use labels for organization and selection:
metadata :
labels :
app : model-trainer
version : v2
environment : production
team : ml-platform
ConfigMaps and Secrets
Externalize configuration:
# Create ConfigMap
kubectl create configmap model-config --from-file=config.yaml
# Create Secret
kubectl create secret generic api-keys --from-literal=token=abc123
Mount in pods:
volumes :
- name : config
configMap :
name : model-config
- name : secrets
secret :
secretName : api-keys
Resources
Learning Materials
Advanced Topics
Next Steps
Learn how to automate building and deploying these resources with CI/CD pipelines .