Skip to main content
This page documents the HorizontalPodAutoscaler (HPA) configuration used in the exchange platform. HPA automatically scales the number of pods based on observed metrics like CPU utilization.

Backend Router HPA

Location: backend/hpa.yml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: exchange-router-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: exchange-router-deployment
  minReplicas: 1
  maxReplicas: 2
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 95

Configuration Breakdown

Metadata

metadata:
  name: exchange-router-hpa
Identifies the HPA resource in the cluster.

Scale Target

scaleTargetRef:
  apiVersion: apps/v1
  kind: Deployment
  name: exchange-router-deployment
Target Configuration:
  • Kind: Deployment - Scales a Deployment resource
  • Name: exchange-router-deployment - Specific deployment to scale
  • API Version: apps/v1 - Kubernetes API version for Deployments

Replica Bounds

minReplicas: 1
maxReplicas: 2
Scaling Boundaries:
  • Minimum Replicas: 1 - Always maintain at least one pod
  • Maximum Replicas: 2 - Never scale beyond two pods
  • Current Setting: Conservative scaling for controlled resource usage
Scaling Behavior:
Low Traffic:  1 pod  (CPU < 95%)
High Traffic: 2 pods (CPU ≥ 95%)

Metrics Configuration

metrics:
- type: Resource
  resource:
    name: cpu
    target:
      type: Utilization
      averageUtilization: 95
Metric Details:
  • Type: Resource - Uses pod resource metrics (CPU/memory)
  • Resource Name: cpu - Monitors CPU utilization
  • Target Type: Utilization - Percentage-based threshold
  • Threshold: 95% - Scale up when average CPU exceeds 95%

How HPA Works

Scaling Algorithm

Basic Formula:
desiredReplicas = ceil[currentReplicas × (currentMetricValue / targetMetricValue)]
Example Calculation:
Current: 1 pod at 100% CPU
Target: 95% CPU

desiredReplicas = ceil[1 × (100 / 95)] = ceil[1.05] = 2 pods

Scaling Decision Loop

HPA evaluates metrics every 15 seconds (default):
1. Query Metrics

   Fetch CPU utilization from metrics-server

2. Calculate Average

   Average CPU across all pods: 98%

3. Compare to Target

   98% > 95% → Need to scale up

4. Calculate Desired Replicas

   ceil[1 × (98/95)] = 2 pods

5. Update Deployment

   Set replicas: 1 → 2

Scaling Behavior

Scale Up:
  • Triggers when average CPU > 95%
  • Happens immediately (no delay)
  • Limited by maxReplicas: 2
Scale Down:
  • Triggers when average CPU < 95%
  • Waits 5 minutes (default stabilization window)
  • Limited by minReplicas: 1
  • Gradual to prevent flapping

CPU Utilization Calculation

Pod CPU Request

From deployment manifest (reference/deployments.mdx:20-24):
resources:
  requests:
    cpu: "300m"
  limits:
    cpu: "2000m"
CPU Request: 300 millicores (0.3 cores)

Utilization Percentage

CPU Utilization % = (Actual CPU Usage / CPU Request) × 100
Examples:
Actual UsageRequestUtilizationAction
250m300m83%No scaling
285m300m95%Threshold reached
300m300m100%Scale up
450m300m150%Scale up (capped at 2)
Important: Utilization is calculated against the requests, not limits.

Scaling Scenarios

Scenario 1: Normal Load

Time 00:00 - CPU: 70%
  Status: 1 pod running
  Action: None (below threshold)

Time 00:05 - CPU: 80%
  Status: 1 pod running
  Action: None (below threshold)

Scenario 2: Traffic Spike

Time 00:00 - CPU: 70%
  Status: 1 pod running

Time 00:10 - CPU: 98%
  Status: 1 pod running
  Action: HPA detects high CPU

Time 00:11 - CPU: 98%
  Status: Scaling to 2 pods
  Action: New pod starting

Time 00:12 - CPU: 50% (distributed across 2 pods)
  Status: 2 pods running
  Action: Stable

Scenario 3: Traffic Drop

Time 00:00 - CPU: 50% (across 2 pods)
  Status: 2 pods running
  Action: Below threshold, but waiting

Time 00:05 - CPU: 45% (across 2 pods)
  Status: 2 pods running
  Action: Stabilization window (5 min)

Time 00:06 - CPU: 40% (across 2 pods)
  Status: Scaling to 1 pod
  Action: Removing 1 pod

Time 00:07 - CPU: 75% (1 pod)
  Status: 1 pod running
  Action: Stable

Metrics Server

Required Component

HPA requires metrics-server to function:
# Install metrics-server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Verify metrics-server
kubectl get deployment metrics-server -n kube-system

# Test metrics collection
kubectl top nodes
kubectl top pods

Check Metrics Availability

# View pod metrics
kubectl top pod -l app=exchange-router

# Example output:
NAME                                    CPU(cores)   MEMORY(bytes)
exchange-router-deployment-7d8c9-xyz   285m         150Mi

HPA Operations

Deploy HPA

# Apply HPA configuration
kubectl apply -f hpa.yml

# Verify HPA created
kubectl get hpa

Monitor HPA Status

# Check HPA status
kubectl get hpa exchange-router-hpa

# Example output:
NAME                  REFERENCE                              TARGETS   MINPODS   MAXPODS   REPLICAS
exchange-router-hpa   Deployment/exchange-router-deployment  85%/95%   1         2         1

# Detailed information
kubectl describe hpa exchange-router-hpa

Watch Scaling Events

# Watch HPA in real-time
kubectl get hpa -w

# View scaling events
kubectl get events --field-selector involvedObject.name=exchange-router-hpa

# Example events:
SuccessfulRescale  HorizontalPodAutoscaler  Scaled up to 2 replicas
SuccessfulRescale  HorizontalPodAutoscaler  Scaled down to 1 replica

Test HPA Scaling

# Generate load to trigger scaling
kubectl run -it --rm load-generator --image=busybox --restart=Never -- sh -c "
  while true; do
    wget -q -O- http://exchange-router-service/api/v1/orders
  done
"

# Monitor CPU usage
kubectl top pods -l app=exchange-router

# Watch HPA scale up
kubectl get hpa -w

Advanced HPA Configuration

Multiple Metrics

Scale based on CPU and memory:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: exchange-router-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: exchange-router-deployment
  minReplicas: 1
  maxReplicas: 5
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 80
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 85

Custom Metrics

Scale based on custom application metrics:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: exchange-router-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: exchange-router-deployment
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "1000"

Scaling Behavior Control

Fine-tune scaling behavior:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: exchange-router-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: exchange-router-deployment
  minReplicas: 1
  maxReplicas: 5
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 4
        periodSeconds: 15
      selectPolicy: Max
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
      selectPolicy: Min

Tuning Recommendations

Production Settings

For production workloads:
minReplicas: 2      # High availability
maxReplicas: 10     # Handle traffic spikes
averageUtilization: 70  # Scale earlier

Development Settings

For development/staging:
minReplicas: 1      # Cost optimization
maxReplicas: 3      # Limited scaling
averageUtilization: 85  # Scale less aggressively

High Traffic Settings

For high-traffic scenarios:
minReplicas: 3      # Always ready
maxReplicas: 20     # Large burst capacity
averageUtilization: 60  # Aggressive scaling

Best Practices

  1. Set CPU Requests: Always define resources.requests.cpu in Deployment
  2. Conservative Limits: Start with narrow min/max range, expand as needed
  3. Monitor Metrics: Watch actual utilization patterns before tuning
  4. Avoid Flapping: Use appropriate thresholds to prevent constant scaling
  5. Test Scaling: Load test to verify HPA behavior
  6. Multiple Metrics: Consider memory and custom metrics for better decisions
  7. Stabilization Windows: Allow time for metrics to stabilize before scaling down
  8. Pod Disruption Budgets: Use PDB with HPA to maintain availability during scaling

Troubleshooting

HPA Not Scaling

# Check HPA status
kubectl describe hpa exchange-router-hpa

# Common issues:
# 1. Missing metrics-server
kubectl get deployment metrics-server -n kube-system

# 2. No CPU requests defined
kubectl get deployment exchange-router-deployment -o yaml | grep -A5 resources

# 3. Metrics not available
kubectl top pods -l app=exchange-router

Metrics Unavailable

# Check metrics-server logs
kubectl logs -n kube-system -l k8s-app=metrics-server

# Verify metrics API
kubectl get apiservice v1beta1.metrics.k8s.io

# Test metrics endpoint
kubectl get --raw /apis/metrics.k8s.io/v1beta1/pods

Unexpected Scaling

# View scaling events
kubectl get events --field-selector involvedObject.name=exchange-router-hpa --sort-by='.lastTimestamp'

# Check current metrics
kubectl get hpa exchange-router-hpa -o yaml

# View pod resource usage
kubectl top pods -l app=exchange-router

Limitations

  1. Minimum Scale Interval: HPA evaluates every 15 seconds but may not scale immediately
  2. Cold Start Time: New pods take time to start and receive traffic
  3. Metrics Lag: Metrics collection has slight delay (typically 30-60 seconds)
  4. Single Deployment: Each HPA targets one Deployment/StatefulSet/ReplicaSet
  5. Resource-Based Only: Current config only uses CPU (not memory or custom metrics)

Build docs developers (and LLMs) love