Autoscaling Configurations
This reference documents the autoscaling and availability configurations that ensure the platform scales efficiently and maintains high availability.Horizontal Pod Autoscaler (HPA)
HPA automatically scales the number of pods based on CPU and memory utilization.Backend HPA
The backend autoscaler maintains API responsiveness under varying load:platform/kubernetes/backend/hpa.yaml:17-32
Scaling Metrics
platform/kubernetes/backend/hpa.yaml:35-59
| Metric | Target | Behavior |
|---|---|---|
| CPU | 70% average | Scale up when sustained CPU usage exceeds 70% |
| Memory | 80% average | Scale up when sustained memory usage exceeds 80% |
Scaling Behavior
Scale Up Policy:platform/kubernetes/backend/hpa.yaml:66-76
- Stabilization: Wait 60 seconds to confirm increased load
- Max Pods: Add up to 2 pods at once
- Max Percent: Scale up by 50% per cycle
- Policy: Use whichever adds more pods (aggressive scale-up)
platform/kubernetes/backend/hpa.yaml:78-85
- Stabilization: Wait 5 minutes (300s) to confirm decreased load
- Max Pods: Remove only 1 pod at a time
- Period: Wait 2 minutes between scale-down events
- Policy: Conservative scale-down to prevent thrashing
Frontend HPA
The frontend autoscaler handles web traffic spikes:platform/kubernetes/frontend/hpa.yaml:15-29
Scaling Metrics
platform/kubernetes/frontend/hpa.yaml:31-44
| Metric | Target | Rationale |
|---|---|---|
| CPU | 80% average | Nginx is efficient; higher threshold acceptable |
| Memory | 85% average | Static files use stable memory |
Scaling Behavior
Scale Up Policy:platform/kubernetes/frontend/hpa.yaml:47-56
- Stabilization: 30 seconds (faster than backend)
- Can double: 100% increase if needed
- Stateless: Safe to scale aggressively
platform/kubernetes/frontend/hpa.yaml:59-65
- Stabilization: 2 minutes (faster than backend’s 5 minutes)
- Remove up to 2 pods: Frontend is stateless, safer to scale down faster
Pod Disruption Budgets (PDB)
PDBs ensure availability during voluntary disruptions (node drains, cluster upgrades).Backend PDB
platform/kubernetes/pdb.yaml:29-43
Guarantee: At least 1 backend pod must remain available during voluntary disruptions.
With minReplicas: 2 in HPA, this ensures:
- During maintenance, Kubernetes won’t drain both backend pods simultaneously
- API remains available during node replacements
- Rolling updates proceed safely
Frontend PDB
platform/kubernetes/pdb.yaml:49-61
Guarantee: At least 1 frontend pod must remain available during voluntary disruptions.
Database PDB
platform/kubernetes/pdb.yaml:71-83
Guarantee: The single PostgreSQL pod cannot be voluntarily disrupted.
Important: With only 1 replica in the StatefulSet:
ALLOWED DISRUPTIONS: 0when viewing PDB status- Database cannot be drained during maintenance without manual intervention
- In production, use managed RDS instead of PostgreSQL in Kubernetes
Autoscaling Formula
HPA calculates desired replicas using:Example: Backend Scaling
Current State:- Current replicas: 2
- Current CPU usage: 85%
- Target CPU: 70%
Scaling Limits
| Component | Min Replicas | Max Replicas | Reason |
|---|---|---|---|
| Backend | 2 | 10 | API availability and cost balance |
| Frontend | 2 | 8 | Nginx efficiency allows fewer replicas |
| Database | 1 | 1 | StatefulSet (use managed RDS in production) |
Monitoring Autoscaling
View HPA status:Best Practices
- Set Requests and Limits: HPA requires resource requests to calculate utilization
- Conservative Targets: Leave headroom (70-80%) before saturation
- Slow Scale Down: Prevent thrashing from temporary load decreases
- Fast Scale Up: Respond quickly to increased demand
- PDB Protection: Always set PDBs to maintain availability during maintenance
- Monitor Metrics: Use Prometheus/Grafana to observe scaling behavior
Integration with Deployment
HPA works with deployment resource requests:- Utilization = 175 / 250 = 70%
- HPA sees 70% utilization