Skip to main content

Overview

Horizontal Pod Autoscaler (HPA) automatically scales the number of pods in a deployment, replica set, or stateful set based on observed metrics such as CPU utilization, memory usage, or custom application metrics.

How HPA Works

┌──────────────┐      reads metrics      ┌──────────────┐
│     HPA      │ ◄────────────────────── │   Metrics    │
│  Controller  │                          │    Server    │
└──────────────┘                          └──────────────┘
       │                                         ▲
       │ scales                                  │
       ▼                                         │ collects
┌──────────────┐      manages pods       ┌──────────────┐
│  Deployment  │ ──────────────────────> │     Pods     │
└──────────────┘                          └──────────────┘
Scaling algorithm:
desiredReplicas = ceil[currentReplicas * (currentMetricValue / targetMetricValue)]

Prerequisites

Metrics Server

HPA requires the Metrics Server to be installed in your cluster:
# Check if metrics server is running
kubectl get deployment metrics-server -n kube-system

# If not installed (GKE usually has it by default)
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Verify metrics are available
kubectl top nodes
kubectl top pods

Resource Requests

HPA requires resource requests to be defined in your deployment. Without requests, CPU/memory-based autoscaling will not work.
spec:
  containers:
    - name: app
      image: app:v1.0
      resources:
        requests:
          cpu: 100m      # Required for CPU-based HPA
          memory: 128Mi  # Required for memory-based HPA
        limits:
          cpu: 200m
          memory: 256Mi

CPU-Based Autoscaling

Basic HPA Configuration

From the source backend deployment:
hpa.yml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: exchange-router-hpa
spec:
  # Target deployment to scale
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: exchange-router-deployment
  
  # Replica boundaries
  minReplicas: 1
  maxReplicas: 2
  
  # Scaling metrics
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 95  # Scale up when CPU > 95%

Understanding the Configuration

  • minReplicas: Minimum number of pods (never scale below this)
  • maxReplicas: Maximum number of pods (never scale above this)
  • averageUtilization: Target CPU percentage (calculated across all pods)
Example behavior:
  • Current: 1 pod at 100% CPU
  • Target: 95% CPU
  • Action: Scale to 2 pods (100/95 ≈ 1.05 → round up to 2)

Creating the HPA

# Apply HPA configuration
kubectl apply -f hpa.yml

# Or create directly with kubectl
kubectl autoscale deployment exchange-router-deployment \
  --cpu-percent=95 \
  --min=1 \
  --max=2

Memory-Based Autoscaling

Scale based on memory utilization:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: exchange-router-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: exchange-router-deployment
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80  # Scale when memory > 80%

Multi-Metric Autoscaling

Combine multiple metrics for more sophisticated scaling:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: exchange-router-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: exchange-router-deployment
  minReplicas: 2
  maxReplicas: 10
  metrics:
  # CPU utilization
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  # Memory utilization
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  # Absolute value target
  - type: Resource
    resource:
      name: cpu
      target:
        type: AverageValue
        averageValue: "500m"  # 0.5 CPU cores
When multiple metrics are specified, HPA calculates desired replicas for each metric and uses the highest value to ensure all metrics are satisfied.

Scaling Behavior Configuration

Control how quickly HPA scales up and down:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: exchange-router-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: exchange-router-deployment
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  # Scaling behavior policies
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0  # Scale up immediately
      policies:
      - type: Percent
        value: 100  # Double the pods
        periodSeconds: 15
      - type: Pods
        value: 4  # Or add 4 pods
        periodSeconds: 15
      selectPolicy: Max  # Use the policy that scales faster
    scaleDown:
      stabilizationWindowSeconds: 300  # Wait 5 minutes before scaling down
      policies:
      - type: Percent
        value: 50  # Remove up to 50% of pods
        periodSeconds: 60
      - type: Pods
        value: 2  # Or remove 2 pods
        periodSeconds: 60
      selectPolicy: Min  # Use the policy that scales slower

Behavior Parameters

  • stabilizationWindowSeconds: How long to wait for metrics to stabilize before scaling
  • type: Percent: Scale by a percentage of current replicas
  • type: Pods: Scale by an absolute number of pods
  • periodSeconds: Time window for the policy
  • selectPolicy: Max (fastest), Min (slowest), or Disabled

Custom Metrics

Scale based on application-specific metrics using custom metrics APIs:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: exchange-router-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: exchange-router-deployment
  minReplicas: 2
  maxReplicas: 20
  metrics:
  # Custom metric: requests per second
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "1000"  # Target 1000 RPS per pod
  # External metric: queue depth
  - type: External
    external:
      metric:
        name: queue_messages_ready
        selector:
          matchLabels:
            queue: "orders"
      target:
        type: AverageValue
        averageValue: "30"  # Target 30 messages per pod
Custom metrics require additional components like Prometheus Adapter or Stackdriver Adapter.

Monitoring HPA

View HPA Status

# List all HPAs
kubectl get hpa

# Watch HPA in real-time
kubectl get hpa exchange-router-hpa --watch

# Describe HPA for detailed information
kubectl describe hpa exchange-router-hpa
Output example:
NAME                   REFERENCE                              TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
exchange-router-hpa   Deployment/exchange-router-deployment   45%/95%   1         2         1          5m

View Scaling Events

# View HPA events
kubectl describe hpa exchange-router-hpa | grep -A 10 Events

# View all scaling events
kubectl get events --field-selector involvedObject.name=exchange-router-hpa

Check Current Metrics

# View pod resource usage
kubectl top pods -l app=exchange-router

# View node resource usage
kubectl top nodes

# Get HPA metrics
kubectl get hpa exchange-router-hpa -o yaml | grep -A 5 currentMetrics

Production Scaling Policies

High-Traffic Application

Aggressive scaling for user-facing services:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-gateway-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-gateway
  minReplicas: 3  # Always have 3 for availability
  maxReplicas: 50
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60  # Scale early
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 200  # Triple capacity quickly
        periodSeconds: 30
    scaleDown:
      stabilizationWindowSeconds: 600  # Wait 10 minutes
      policies:
      - type: Pods
        value: 1
        periodSeconds: 120  # Remove 1 pod every 2 minutes

Background Worker

Conservative scaling for batch processing:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: worker-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: background-worker
  minReplicas: 1
  maxReplicas: 20
  metrics:
  - type: External
    external:
      metric:
        name: queue_depth
      target:
        type: AverageValue
        averageValue: "50"  # 50 jobs per pod
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Pods
        value: 2  # Add 2 pods at a time
        periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Pods
        value: 1
        periodSeconds: 180  # Remove 1 pod every 3 minutes

Best Practices

Set Resource Requests

Always define CPU and memory requests for accurate autoscaling

Avoid Aggressive Scaling

Use stabilization windows to prevent flapping during traffic spikes

Set Realistic Thresholds

Target 60-80% utilization for CPU to allow headroom for spikes

Monitor Scaling Events

Track scaling patterns to tune your HPA configuration

Use Multiple Metrics

Combine CPU, memory, and custom metrics for robust scaling

Test Scaling Behavior

Load test your application to verify HPA responds correctly

Vertical Pod Autoscaler (VPA)

While HPA scales the number of pods, VPA adjusts CPU and memory requests:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: exchange-router-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: exchange-router-deployment
  updatePolicy:
    updateMode: Auto  # Auto, Recreate, Initial, or Off
  resourcePolicy:
    containerPolicies:
    - containerName: router
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 2000m
        memory: 2Gi
Do not use HPA and VPA on the same CPU/memory metrics simultaneously. Use HPA for horizontal scaling and VPA for vertical resource tuning separately, or use VPA in recommendation mode only.

Troubleshooting

HPA Shows “Unknown” Metrics

# Check if metrics server is running
kubectl get deployment metrics-server -n kube-system

# Check metrics server logs
kubectl logs -n kube-system -l k8s-app=metrics-server

# Verify resource requests are set
kubectl get deployment exchange-router-deployment -o yaml | grep -A 5 resources

HPA Not Scaling

# Check HPA conditions
kubectl describe hpa exchange-router-hpa

# Verify current metrics
kubectl top pods -l app=exchange-router

# Check deployment selector matches pods
kubectl get deployment exchange-router-deployment -o yaml | grep -A 3 selector
kubectl get pods -l app=exchange-router

Rapid Scaling Flapping

  • Increase stabilizationWindowSeconds for scale-down
  • Adjust target utilization threshold
  • Check for metric spikes causing oscillation
  • Review scaling policies and make them more conservative

Pods Not Scheduled (Max Replicas Reached)

# Check cluster capacity
kubectl top nodes

# Check pending pods
kubectl get pods --field-selector=status.phase=Pending

# Describe pending pod
kubectl describe pod <pending-pod-name>

# Consider cluster autoscaling or increasing node capacity

Load Testing HPA

Test your HPA configuration:
# Generate load using kubectl run
kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://exchange-router-service; done"

# Watch HPA scale
kubectl get hpa exchange-router-hpa --watch

# Monitor pod count
watch kubectl get pods -l app=exchange-router

Build docs developers (and LLMs) love