Scaling Aurora
Guidelines for scaling Aurora to handle increased load, traffic, and concurrent users.Scaling Strategy
Aurora components fall into three categories:Stateless Services (Horizontally Scalable)
These services can be scaled by increasing replica count:- aurora-server - REST API (handles HTTP requests)
- celery-worker - Background tasks (RCA analysis, integrations)
- chatbot - WebSocket server (chat interface)
- frontend - Next.js UI (serves web pages)
- searxng - Web search engine
- t2v-transformers - ML embeddings
Stateful Services (Requires Special Configuration)
These services require additional setup for horizontal scaling:- postgres - Database (replication, read replicas)
- redis - Cache and queue (Redis Cluster)
- weaviate - Vector database (multi-node cluster)
- vault - Secrets management (Raft HA)
Single-Instance Services
These MUST remain at 1 replica:- celery-beat - Task scheduler (multiple instances cause duplicate tasks)
Horizontal Scaling
Kubernetes (Helm)
Increase replica counts invalues.generated.yaml:
Docker Compose
Scale services manually:docker-compose.yaml:
Auto-Scaling
Horizontal Pod Autoscaler (HPA)
Automatically scale based on CPU/memory:Custom Metrics Autoscaling
Scale based on application metrics:Vertical Scaling
Increase Resource Limits
For Kubernetes, updatevalues.generated.yaml:
Vertical Pod Autoscaler (VPA)
Automatically adjust resource requests:Database Scaling
PostgreSQL
Read Replicas
For read-heavy workloads: Managed Database (Recommended):- AWS RDS: Create read replicas via console
- GCP Cloud SQL: Enable read replicas
- Azure Database: Add read replicas
Connection Pooling
Use PgBouncer to reduce database connections:Redis Scaling
Redis Cluster
For high availability and sharding:Redis Sentinel
For failover without sharding:Vector Database Scaling
Weaviate Clustering
For production, use Weaviate Cloud or multi-node cluster: Weaviate Cloud (Recommended):Load Balancing
Ingress Session Affinity
For WebSocket connections, enable session affinity:External Load Balancer
For cloud deployments: AWS ALB:Monitoring Scaling
Key Metrics to Track
- Request rate (requests/sec)
- Response time (p50, p95, p99)
- Error rate (%)
- CPU usage (%)
- Memory usage (%)
- Celery queue length
- Database connections
- Redis memory usage
Grafana Dashboard
Import Aurora dashboard:Performance Optimization
Caching
Enable aggressive caching:Cost Optimization
Reduce LLM costs during scale:Testing Scaling
Load Testing
Use k6 for load testing:Stress Testing
Scaling Checklist
Before scaling to production:- Resource requests and limits configured
- HPA configured for stateless services
- Database connection pooling enabled
- Redis clustering or Sentinel configured
- Weaviate clustering or cloud instance
- Session affinity enabled for WebSocket
- Monitoring and alerting set up
- Load testing performed
- Cost optimization settings reviewed
- Backup strategy scales with data growth
- Log aggregation handles increased volume
- Network policies allow pod-to-pod communication
Common Scaling Issues
Pod OOMKilled
Increase memory limits:Database Connection Exhaustion
Add PgBouncer or increase connection limits:Redis Memory Issues
Increase Redis memory or add eviction policy:Slow Response Times
Profile application:Next Steps
Production Best Practices
Security and reliability for production
Monitoring
Set up comprehensive monitoring
Performance Tuning
Optimize Aurora performance
Troubleshooting
Common scaling issues