Overview
Horizontal Pod Autoscaler (HPA) automatically scales the number of pods in a deployment, replica set, or stateful set based on observed metrics such as CPU utilization, memory usage, or custom application metrics.
How HPA Works
┌──────────────┐ reads metrics ┌──────────────┐
│ HPA │ ◄────────────────────── │ Metrics │
│ Controller │ │ Server │
└──────────────┘ └──────────────┘
│ ▲
│ scales │
▼ │ collects
┌──────────────┐ manages pods ┌──────────────┐
│ Deployment │ ──────────────────────> │ Pods │
└──────────────┘ └──────────────┘
Scaling algorithm :
desiredReplicas = ceil[currentReplicas * (currentMetricValue / targetMetricValue)]
Prerequisites
Metrics Server
HPA requires the Metrics Server to be installed in your cluster:
# Check if metrics server is running
kubectl get deployment metrics-server -n kube-system
# If not installed (GKE usually has it by default)
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# Verify metrics are available
kubectl top nodes
kubectl top pods
Resource Requests
HPA requires resource requests to be defined in your deployment. Without requests, CPU/memory-based autoscaling will not work.
spec :
containers :
- name : app
image : app:v1.0
resources :
requests :
cpu : 100m # Required for CPU-based HPA
memory : 128Mi # Required for memory-based HPA
limits :
cpu : 200m
memory : 256Mi
CPU-Based Autoscaling
Basic HPA Configuration
From the source backend deployment:
apiVersion : autoscaling/v2
kind : HorizontalPodAutoscaler
metadata :
name : exchange-router-hpa
spec :
# Target deployment to scale
scaleTargetRef :
apiVersion : apps/v1
kind : Deployment
name : exchange-router-deployment
# Replica boundaries
minReplicas : 1
maxReplicas : 2
# Scaling metrics
metrics :
- type : Resource
resource :
name : cpu
target :
type : Utilization
averageUtilization : 95 # Scale up when CPU > 95%
Understanding the Configuration
minReplicas : Minimum number of pods (never scale below this)
maxReplicas : Maximum number of pods (never scale above this)
averageUtilization : Target CPU percentage (calculated across all pods)
Example behavior :
Current: 1 pod at 100% CPU
Target: 95% CPU
Action: Scale to 2 pods (100/95 ≈ 1.05 → round up to 2)
Creating the HPA
# Apply HPA configuration
kubectl apply -f hpa.yml
# Or create directly with kubectl
kubectl autoscale deployment exchange-router-deployment \
--cpu-percent=95 \
--min=1 \
--max=2
Memory-Based Autoscaling
Scale based on memory utilization:
apiVersion : autoscaling/v2
kind : HorizontalPodAutoscaler
metadata :
name : exchange-router-hpa
spec :
scaleTargetRef :
apiVersion : apps/v1
kind : Deployment
name : exchange-router-deployment
minReplicas : 2
maxReplicas : 10
metrics :
- type : Resource
resource :
name : memory
target :
type : Utilization
averageUtilization : 80 # Scale when memory > 80%
Multi-Metric Autoscaling
Combine multiple metrics for more sophisticated scaling:
apiVersion : autoscaling/v2
kind : HorizontalPodAutoscaler
metadata :
name : exchange-router-hpa
spec :
scaleTargetRef :
apiVersion : apps/v1
kind : Deployment
name : exchange-router-deployment
minReplicas : 2
maxReplicas : 10
metrics :
# CPU utilization
- type : Resource
resource :
name : cpu
target :
type : Utilization
averageUtilization : 70
# Memory utilization
- type : Resource
resource :
name : memory
target :
type : Utilization
averageUtilization : 80
# Absolute value target
- type : Resource
resource :
name : cpu
target :
type : AverageValue
averageValue : "500m" # 0.5 CPU cores
When multiple metrics are specified, HPA calculates desired replicas for each metric and uses the highest value to ensure all metrics are satisfied.
Scaling Behavior Configuration
Control how quickly HPA scales up and down:
apiVersion : autoscaling/v2
kind : HorizontalPodAutoscaler
metadata :
name : exchange-router-hpa
spec :
scaleTargetRef :
apiVersion : apps/v1
kind : Deployment
name : exchange-router-deployment
minReplicas : 2
maxReplicas : 10
metrics :
- type : Resource
resource :
name : cpu
target :
type : Utilization
averageUtilization : 70
# Scaling behavior policies
behavior :
scaleUp :
stabilizationWindowSeconds : 0 # Scale up immediately
policies :
- type : Percent
value : 100 # Double the pods
periodSeconds : 15
- type : Pods
value : 4 # Or add 4 pods
periodSeconds : 15
selectPolicy : Max # Use the policy that scales faster
scaleDown :
stabilizationWindowSeconds : 300 # Wait 5 minutes before scaling down
policies :
- type : Percent
value : 50 # Remove up to 50% of pods
periodSeconds : 60
- type : Pods
value : 2 # Or remove 2 pods
periodSeconds : 60
selectPolicy : Min # Use the policy that scales slower
Behavior Parameters
stabilizationWindowSeconds : How long to wait for metrics to stabilize before scaling
type: Percent : Scale by a percentage of current replicas
type: Pods : Scale by an absolute number of pods
periodSeconds : Time window for the policy
selectPolicy : Max (fastest), Min (slowest), or Disabled
Custom Metrics
Scale based on application-specific metrics using custom metrics APIs:
apiVersion : autoscaling/v2
kind : HorizontalPodAutoscaler
metadata :
name : exchange-router-hpa
spec :
scaleTargetRef :
apiVersion : apps/v1
kind : Deployment
name : exchange-router-deployment
minReplicas : 2
maxReplicas : 20
metrics :
# Custom metric: requests per second
- type : Pods
pods :
metric :
name : http_requests_per_second
target :
type : AverageValue
averageValue : "1000" # Target 1000 RPS per pod
# External metric: queue depth
- type : External
external :
metric :
name : queue_messages_ready
selector :
matchLabels :
queue : "orders"
target :
type : AverageValue
averageValue : "30" # Target 30 messages per pod
Custom metrics require additional components like Prometheus Adapter or Stackdriver Adapter.
Monitoring HPA
View HPA Status
# List all HPAs
kubectl get hpa
# Watch HPA in real-time
kubectl get hpa exchange-router-hpa --watch
# Describe HPA for detailed information
kubectl describe hpa exchange-router-hpa
Output example:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
exchange-router-hpa Deployment/exchange-router-deployment 45%/95% 1 2 1 5m
View Scaling Events
# View HPA events
kubectl describe hpa exchange-router-hpa | grep -A 10 Events
# View all scaling events
kubectl get events --field-selector involvedObject.name=exchange-router-hpa
Check Current Metrics
# View pod resource usage
kubectl top pods -l app=exchange-router
# View node resource usage
kubectl top nodes
# Get HPA metrics
kubectl get hpa exchange-router-hpa -o yaml | grep -A 5 currentMetrics
Production Scaling Policies
High-Traffic Application
Aggressive scaling for user-facing services:
apiVersion : autoscaling/v2
kind : HorizontalPodAutoscaler
metadata :
name : api-gateway-hpa
spec :
scaleTargetRef :
apiVersion : apps/v1
kind : Deployment
name : api-gateway
minReplicas : 3 # Always have 3 for availability
maxReplicas : 50
metrics :
- type : Resource
resource :
name : cpu
target :
type : Utilization
averageUtilization : 60 # Scale early
behavior :
scaleUp :
stabilizationWindowSeconds : 0
policies :
- type : Percent
value : 200 # Triple capacity quickly
periodSeconds : 30
scaleDown :
stabilizationWindowSeconds : 600 # Wait 10 minutes
policies :
- type : Pods
value : 1
periodSeconds : 120 # Remove 1 pod every 2 minutes
Background Worker
Conservative scaling for batch processing:
apiVersion : autoscaling/v2
kind : HorizontalPodAutoscaler
metadata :
name : worker-hpa
spec :
scaleTargetRef :
apiVersion : apps/v1
kind : Deployment
name : background-worker
minReplicas : 1
maxReplicas : 20
metrics :
- type : External
external :
metric :
name : queue_depth
target :
type : AverageValue
averageValue : "50" # 50 jobs per pod
behavior :
scaleUp :
stabilizationWindowSeconds : 60
policies :
- type : Pods
value : 2 # Add 2 pods at a time
periodSeconds : 60
scaleDown :
stabilizationWindowSeconds : 300
policies :
- type : Pods
value : 1
periodSeconds : 180 # Remove 1 pod every 3 minutes
Best Practices
Set Resource Requests Always define CPU and memory requests for accurate autoscaling
Avoid Aggressive Scaling Use stabilization windows to prevent flapping during traffic spikes
Set Realistic Thresholds Target 60-80% utilization for CPU to allow headroom for spikes
Monitor Scaling Events Track scaling patterns to tune your HPA configuration
Use Multiple Metrics Combine CPU, memory, and custom metrics for robust scaling
Test Scaling Behavior Load test your application to verify HPA responds correctly
Vertical Pod Autoscaler (VPA)
While HPA scales the number of pods, VPA adjusts CPU and memory requests:
apiVersion : autoscaling.k8s.io/v1
kind : VerticalPodAutoscaler
metadata :
name : exchange-router-vpa
spec :
targetRef :
apiVersion : apps/v1
kind : Deployment
name : exchange-router-deployment
updatePolicy :
updateMode : Auto # Auto, Recreate, Initial, or Off
resourcePolicy :
containerPolicies :
- containerName : router
minAllowed :
cpu : 100m
memory : 128Mi
maxAllowed :
cpu : 2000m
memory : 2Gi
Do not use HPA and VPA on the same CPU/memory metrics simultaneously. Use HPA for horizontal scaling and VPA for vertical resource tuning separately, or use VPA in recommendation mode only.
Troubleshooting
HPA Shows “Unknown” Metrics
# Check if metrics server is running
kubectl get deployment metrics-server -n kube-system
# Check metrics server logs
kubectl logs -n kube-system -l k8s-app=metrics-server
# Verify resource requests are set
kubectl get deployment exchange-router-deployment -o yaml | grep -A 5 resources
HPA Not Scaling
# Check HPA conditions
kubectl describe hpa exchange-router-hpa
# Verify current metrics
kubectl top pods -l app=exchange-router
# Check deployment selector matches pods
kubectl get deployment exchange-router-deployment -o yaml | grep -A 3 selector
kubectl get pods -l app=exchange-router
Rapid Scaling Flapping
Increase stabilizationWindowSeconds for scale-down
Adjust target utilization threshold
Check for metric spikes causing oscillation
Review scaling policies and make them more conservative
Pods Not Scheduled (Max Replicas Reached)
# Check cluster capacity
kubectl top nodes
# Check pending pods
kubectl get pods --field-selector=status.phase=Pending
# Describe pending pod
kubectl describe pod < pending-pod-nam e >
# Consider cluster autoscaling or increasing node capacity
Load Testing HPA
Test your HPA configuration:
# Generate load using kubectl run
kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://exchange-router-service; done"
# Watch HPA scale
kubectl get hpa exchange-router-hpa --watch
# Monitor pod count
watch kubectl get pods -l app=exchange-router