HorizontalPodAutoscaler

This page documents the HorizontalPodAutoscaler (HPA) configuration used in the exchange platform. HPA automatically scales the number of pods based on observed metrics like CPU utilization.

Backend Router HPA

Location: backend/hpa.yml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: exchange-router-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: exchange-router-deployment
  minReplicas: 1
  maxReplicas: 2
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 95

Configuration Breakdown

Metadata

metadata:
  name: exchange-router-hpa

Identifies the HPA resource in the cluster.

Scale Target

scaleTargetRef:
  apiVersion: apps/v1
  kind: Deployment
  name: exchange-router-deployment

Target Configuration:

Kind: Deployment - Scales a Deployment resource
Name: exchange-router-deployment - Specific deployment to scale
API Version: apps/v1 - Kubernetes API version for Deployments

Replica Bounds

minReplicas: 1
maxReplicas: 2

Scaling Boundaries:

Minimum Replicas: 1 - Always maintain at least one pod
Maximum Replicas: 2 - Never scale beyond two pods
Current Setting: Conservative scaling for controlled resource usage

Scaling Behavior:

Low Traffic:  1 pod  (CPU < 95%)
High Traffic: 2 pods (CPU ≥ 95%)

Metrics Configuration

metrics:
- type: Resource
  resource:
    name: cpu
    target:
      type: Utilization
      averageUtilization: 95

Metric Details:

Type: Resource - Uses pod resource metrics (CPU/memory)
Resource Name: cpu - Monitors CPU utilization
Target Type: Utilization - Percentage-based threshold
Threshold: 95% - Scale up when average CPU exceeds 95%

How HPA Works

Scaling Algorithm

Basic Formula:

desiredReplicas = ceil[currentReplicas × (currentMetricValue / targetMetricValue)]

Example Calculation:

Current: 1 pod at 100% CPU
Target: 95% CPU

desiredReplicas = ceil[1 × (100 / 95)] = ceil[1.05] = 2 pods

Scaling Decision Loop

HPA evaluates metrics every 15 seconds (default):

1. Query Metrics
   ↓
   Fetch CPU utilization from metrics-server
   ↓
2. Calculate Average
   ↓
   Average CPU across all pods: 98%
   ↓
3. Compare to Target
   ↓
   98% > 95% → Need to scale up
   ↓
4. Calculate Desired Replicas
   ↓
   ceil[1 × (98/95)] = 2 pods
   ↓
5. Update Deployment
   ↓
   Set replicas: 1 → 2

Scaling Behavior

Scale Up:

Triggers when average CPU > 95%
Happens immediately (no delay)
Limited by maxReplicas: 2

Scale Down:

Triggers when average CPU < 95%
Waits 5 minutes (default stabilization window)
Limited by minReplicas: 1
Gradual to prevent flapping

CPU Utilization Calculation

Pod CPU Request

From deployment manifest (reference/deployments.mdx:20-24):

resources:
  requests:
    cpu: "300m"
  limits:
    cpu: "2000m"

CPU Request: 300 millicores (0.3 cores)

Utilization Percentage

CPU Utilization % = (Actual CPU Usage / CPU Request) × 100

Examples:

Actual Usage	Request	Utilization	Action
250m	300m	83%	No scaling
285m	300m	95%	Threshold reached
300m	300m	100%	Scale up
450m	300m	150%	Scale up (capped at 2)

Important: Utilization is calculated against the requests, not limits.

Scaling Scenarios

Scenario 1: Normal Load

Time 00:00 - CPU: 70%
  Status: 1 pod running
  Action: None (below threshold)

Time 00:05 - CPU: 80%
  Status: 1 pod running
  Action: None (below threshold)

Scenario 2: Traffic Spike

Time 00:00 - CPU: 70%
  Status: 1 pod running

Time 00:10 - CPU: 98%
  Status: 1 pod running
  Action: HPA detects high CPU

Time 00:11 - CPU: 98%
  Status: Scaling to 2 pods
  Action: New pod starting

Time 00:12 - CPU: 50% (distributed across 2 pods)
  Status: 2 pods running
  Action: Stable

Scenario 3: Traffic Drop

Time 00:00 - CPU: 50% (across 2 pods)
  Status: 2 pods running
  Action: Below threshold, but waiting

Time 00:05 - CPU: 45% (across 2 pods)
  Status: 2 pods running
  Action: Stabilization window (5 min)

Time 00:06 - CPU: 40% (across 2 pods)
  Status: Scaling to 1 pod
  Action: Removing 1 pod

Time 00:07 - CPU: 75% (1 pod)
  Status: 1 pod running
  Action: Stable

Metrics Server

Required Component

HPA requires metrics-server to function:

# Install metrics-server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Verify metrics-server
kubectl get deployment metrics-server -n kube-system

# Test metrics collection
kubectl top nodes
kubectl top pods

Check Metrics Availability

# View pod metrics
kubectl top pod -l app=exchange-router

# Example output:
NAME                                    CPU(cores)   MEMORY(bytes)
exchange-router-deployment-7d8c9-xyz   285m         150Mi

HPA Operations

Deploy HPA

# Apply HPA configuration
kubectl apply -f hpa.yml

# Verify HPA created
kubectl get hpa

Monitor HPA Status

# Check HPA status
kubectl get hpa exchange-router-hpa

# Example output:
NAME                  REFERENCE                              TARGETS   MINPODS   MAXPODS   REPLICAS
exchange-router-hpa   Deployment/exchange-router-deployment  85%/95%   1         2         1

# Detailed information
kubectl describe hpa exchange-router-hpa

Watch Scaling Events

# Watch HPA in real-time
kubectl get hpa -w

# View scaling events
kubectl get events --field-selector involvedObject.name=exchange-router-hpa

# Example events:
SuccessfulRescale  HorizontalPodAutoscaler  Scaled up to 2 replicas
SuccessfulRescale  HorizontalPodAutoscaler  Scaled down to 1 replica

Test HPA Scaling

# Generate load to trigger scaling
kubectl run -it --rm load-generator --image=busybox --restart=Never -- sh -c "
  while true; do
    wget -q -O- http://exchange-router-service/api/v1/orders
  done
"

# Monitor CPU usage
kubectl top pods -l app=exchange-router

# Watch HPA scale up
kubectl get hpa -w

Advanced HPA Configuration

Multiple Metrics

Scale based on CPU and memory:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: exchange-router-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: exchange-router-deployment
  minReplicas: 1
  maxReplicas: 5
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 80
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 85

Custom Metrics

Scale based on custom application metrics:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: exchange-router-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: exchange-router-deployment
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "1000"

Scaling Behavior Control

Fine-tune scaling behavior:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: exchange-router-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: exchange-router-deployment
  minReplicas: 1
  maxReplicas: 5
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 4
        periodSeconds: 15
      selectPolicy: Max
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
      selectPolicy: Min

Tuning Recommendations

Production Settings

For production workloads:

minReplicas: 2      # High availability
maxReplicas: 10     # Handle traffic spikes
averageUtilization: 70  # Scale earlier

Development Settings

For development/staging:

minReplicas: 1      # Cost optimization
maxReplicas: 3      # Limited scaling
averageUtilization: 85  # Scale less aggressively

High Traffic Settings

For high-traffic scenarios:

minReplicas: 3      # Always ready
maxReplicas: 20     # Large burst capacity
averageUtilization: 60  # Aggressive scaling

Best Practices

Set CPU Requests: Always define resources.requests.cpu in Deployment
Conservative Limits: Start with narrow min/max range, expand as needed
Monitor Metrics: Watch actual utilization patterns before tuning
Avoid Flapping: Use appropriate thresholds to prevent constant scaling
Test Scaling: Load test to verify HPA behavior
Multiple Metrics: Consider memory and custom metrics for better decisions
Stabilization Windows: Allow time for metrics to stabilize before scaling down
Pod Disruption Budgets: Use PDB with HPA to maintain availability during scaling

Troubleshooting

HPA Not Scaling

# Check HPA status
kubectl describe hpa exchange-router-hpa

# Common issues:
# 1. Missing metrics-server
kubectl get deployment metrics-server -n kube-system

# 2. No CPU requests defined
kubectl get deployment exchange-router-deployment -o yaml | grep -A5 resources

# 3. Metrics not available
kubectl top pods -l app=exchange-router

Metrics Unavailable

# Check metrics-server logs
kubectl logs -n kube-system -l k8s-app=metrics-server

# Verify metrics API
kubectl get apiservice v1beta1.metrics.k8s.io

# Test metrics endpoint
kubectl get --raw /apis/metrics.k8s.io/v1beta1/pods

Unexpected Scaling

# View scaling events
kubectl get events --field-selector involvedObject.name=exchange-router-hpa --sort-by='.lastTimestamp'

# Check current metrics
kubectl get hpa exchange-router-hpa -o yaml

# View pod resource usage
kubectl top pods -l app=exchange-router

Limitations

Minimum Scale Interval: HPA evaluates every 15 seconds but may not scale immediately
Cold Start Time: New pods take time to start and receive traffic
Metrics Lag: Metrics collection has slight delay (typically 30-60 seconds)
Single Deployment: Each HPA targets one Deployment/StatefulSet/ReplicaSet
Resource-Based Only: Current config only uses CPU (not memory or custom metrics)

Deployments - Backend router deployment scaled by this HPA
Services - Service load balancing across scaled pods
Kubernetes HPA Documentation
Metrics Server

Kubernetes Manifests

Configuration

​Backend Router HPA

​Configuration Breakdown

​Metadata

​Scale Target

​Replica Bounds

​Metrics Configuration

​How HPA Works

​Scaling Algorithm

​Scaling Decision Loop

​Scaling Behavior

​CPU Utilization Calculation

​Pod CPU Request

​Utilization Percentage

​Scaling Scenarios

​Scenario 1: Normal Load

​Scenario 2: Traffic Spike

​Scenario 3: Traffic Drop

​Metrics Server

​Required Component

​Check Metrics Availability

​HPA Operations

​Deploy HPA

​Monitor HPA Status

​Watch Scaling Events

​Test HPA Scaling

​Advanced HPA Configuration

​Multiple Metrics

​Custom Metrics

​Scaling Behavior Control

​Tuning Recommendations

​Production Settings

​Development Settings

​High Traffic Settings

​Best Practices

​Troubleshooting

​HPA Not Scaling

​Metrics Unavailable

​Unexpected Scaling

​Limitations

​Related Resources

Build docs developers (and LLMs) love