Backend Router HPA
Location:backend/hpa.yml
Configuration Breakdown
Metadata
Scale Target
- Kind:
Deployment- Scales a Deployment resource - Name:
exchange-router-deployment- Specific deployment to scale - API Version:
apps/v1- Kubernetes API version for Deployments
Replica Bounds
- Minimum Replicas: 1 - Always maintain at least one pod
- Maximum Replicas: 2 - Never scale beyond two pods
- Current Setting: Conservative scaling for controlled resource usage
Metrics Configuration
- Type:
Resource- Uses pod resource metrics (CPU/memory) - Resource Name:
cpu- Monitors CPU utilization - Target Type:
Utilization- Percentage-based threshold - Threshold: 95% - Scale up when average CPU exceeds 95%
How HPA Works
Scaling Algorithm
Basic Formula:Scaling Decision Loop
HPA evaluates metrics every 15 seconds (default):Scaling Behavior
Scale Up:- Triggers when average CPU > 95%
- Happens immediately (no delay)
- Limited by
maxReplicas: 2
- Triggers when average CPU < 95%
- Waits 5 minutes (default stabilization window)
- Limited by
minReplicas: 1 - Gradual to prevent flapping
CPU Utilization Calculation
Pod CPU Request
From deployment manifest (reference/deployments.mdx:20-24):Utilization Percentage
| Actual Usage | Request | Utilization | Action |
|---|---|---|---|
| 250m | 300m | 83% | No scaling |
| 285m | 300m | 95% | Threshold reached |
| 300m | 300m | 100% | Scale up |
| 450m | 300m | 150% | Scale up (capped at 2) |
requests, not limits.
Scaling Scenarios
Scenario 1: Normal Load
Scenario 2: Traffic Spike
Scenario 3: Traffic Drop
Metrics Server
Required Component
HPA requires metrics-server to function:Check Metrics Availability
HPA Operations
Deploy HPA
Monitor HPA Status
Watch Scaling Events
Test HPA Scaling
Advanced HPA Configuration
Multiple Metrics
Scale based on CPU and memory:Custom Metrics
Scale based on custom application metrics:Scaling Behavior Control
Fine-tune scaling behavior:Tuning Recommendations
Production Settings
For production workloads:Development Settings
For development/staging:High Traffic Settings
For high-traffic scenarios:Best Practices
- Set CPU Requests: Always define
resources.requests.cpuin Deployment - Conservative Limits: Start with narrow min/max range, expand as needed
- Monitor Metrics: Watch actual utilization patterns before tuning
- Avoid Flapping: Use appropriate thresholds to prevent constant scaling
- Test Scaling: Load test to verify HPA behavior
- Multiple Metrics: Consider memory and custom metrics for better decisions
- Stabilization Windows: Allow time for metrics to stabilize before scaling down
- Pod Disruption Budgets: Use PDB with HPA to maintain availability during scaling
Troubleshooting
HPA Not Scaling
Metrics Unavailable
Unexpected Scaling
Limitations
- Minimum Scale Interval: HPA evaluates every 15 seconds but may not scale immediately
- Cold Start Time: New pods take time to start and receive traffic
- Metrics Lag: Metrics collection has slight delay (typically 30-60 seconds)
- Single Deployment: Each HPA targets one Deployment/StatefulSet/ReplicaSet
- Resource-Based Only: Current config only uses CPU (not memory or custom metrics)
Related Resources
- Deployments - Backend router deployment scaled by this HPA
- Services - Service load balancing across scaled pods
- Kubernetes HPA Documentation
- Metrics Server

