Skip to main content

Scaling Aurora

Guidelines for scaling Aurora to handle increased load, traffic, and concurrent users.

Scaling Strategy

Aurora components fall into three categories:

Stateless Services (Horizontally Scalable)

These services can be scaled by increasing replica count:
  • aurora-server - REST API (handles HTTP requests)
  • celery-worker - Background tasks (RCA analysis, integrations)
  • chatbot - WebSocket server (chat interface)
  • frontend - Next.js UI (serves web pages)
  • searxng - Web search engine
  • t2v-transformers - ML embeddings

Stateful Services (Requires Special Configuration)

These services require additional setup for horizontal scaling:
  • postgres - Database (replication, read replicas)
  • redis - Cache and queue (Redis Cluster)
  • weaviate - Vector database (multi-node cluster)
  • vault - Secrets management (Raft HA)

Single-Instance Services

These MUST remain at 1 replica:
  • celery-beat - Task scheduler (multiple instances cause duplicate tasks)

Horizontal Scaling

Kubernetes (Helm)

Increase replica counts in values.generated.yaml:
replicaCounts:
  # Scale based on traffic
  server: 5          # API requests
  celeryWorker: 10   # Background tasks
  chatbot: 3         # WebSocket connections
  frontend: 3        # Web traffic
  
  # Scale for performance
  searxng: 2         # Web search
  transformers: 2    # ML embeddings
  
  # Keep at 1
  celeryBeat: 1      # DO NOT SCALE
Apply changes:
helm upgrade aurora-oss ./deploy/helm/aurora \
  --namespace aurora \
  -f values.generated.yaml

Docker Compose

Scale services manually:
# Scale specific service
docker compose up -d --scale celery_worker=5

# Scale multiple services
docker compose up -d \
  --scale celery_worker=5 \
  --scale aurora-server=3
Or edit docker-compose.yaml:
services:
  celery_worker:
    # ... existing config ...
    deploy:
      replicas: 5

Auto-Scaling

Horizontal Pod Autoscaler (HPA)

Automatically scale based on CPU/memory:
1

Enable metrics server

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
2

Create HPA for API server

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: aurora-server-hpa
  namespace: aurora
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: aurora-oss-server
  minReplicas: 3
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
        - type: Percent
          value: 50
          periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Pods
          value: 1
          periodSeconds: 120
Apply:
kubectl apply -f aurora-server-hpa.yaml
3

Create HPA for Celery workers

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: aurora-celery-worker-hpa
  namespace: aurora
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: aurora-oss-celery-worker
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 80
4

Monitor autoscaling

# Check HPA status
kubectl get hpa -n aurora

# Watch scaling events
kubectl get hpa -n aurora -w

# View scaling events
kubectl describe hpa aurora-server-hpa -n aurora

Custom Metrics Autoscaling

Scale based on application metrics:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: aurora-celery-queue-hpa
  namespace: aurora
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: aurora-oss-celery-worker
  minReplicas: 2
  maxReplicas: 50
  metrics:
    # Scale based on Celery queue length
    - type: External
      external:
        metric:
          name: redis_celery_queue_length
          selector:
            matchLabels:
              queue: celery
        target:
          type: AverageValue
          averageValue: "10"  # 10 tasks per worker

Vertical Scaling

Increase Resource Limits

For Kubernetes, update values.generated.yaml:
resources:
  server:
    requests:
      cpu: "1000m"      # Increased from 500m
      memory: "2Gi"     # Increased from 1Gi
    limits:
      cpu: "4000m"      # Increased from 2000m
      memory: "8Gi"     # Increased from 4Gi
  
  celeryWorker:
    requests:
      cpu: "500m"       # Increased from 200m
      memory: "4Gi"     # Increased from 2Gi
    limits:
      cpu: "2000m"      # Increased from 1000m
      memory: "16Gi"    # Increased from 8Gi
Apply changes:
helm upgrade aurora-oss ./deploy/helm/aurora \
  --namespace aurora \
  -f values.generated.yaml

Vertical Pod Autoscaler (VPA)

Automatically adjust resource requests:
1

Install VPA

git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh
2

Create VPA for API server

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: aurora-server-vpa
  namespace: aurora
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: aurora-oss-server
  updatePolicy:
    updateMode: "Auto"  # or "Recreate" or "Initial"
  resourcePolicy:
    containerPolicies:
      - containerName: '*'
        minAllowed:
          cpu: 100m
          memory: 256Mi
        maxAllowed:
          cpu: 4000m
          memory: 8Gi
3

Monitor recommendations

kubectl get vpa aurora-server-vpa -n aurora
kubectl describe vpa aurora-server-vpa -n aurora

Database Scaling

PostgreSQL

Read Replicas

For read-heavy workloads: Managed Database (Recommended):
  • AWS RDS: Create read replicas via console
  • GCP Cloud SQL: Enable read replicas
  • Azure Database: Add read replicas
Configure application:
config:
  POSTGRES_HOST: "aurora-primary.xyz.rds.amazonaws.com"
  POSTGRES_READ_REPLICA_HOST: "aurora-replica.xyz.rds.amazonaws.com"
Self-Managed:
# Deploy read replica
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres-replica
spec:
  serviceName: postgres-replica
  replicas: 2
  template:
    spec:
      containers:
        - name: postgres
          image: postgres:15-alpine
          env:
            - name: POSTGRES_PRIMARY_HOST
              value: "aurora-oss-postgres-0.aurora-oss-postgres"
            - name: POSTGRES_REPLICATION_MODE
              value: "slave"

Connection Pooling

Use PgBouncer to reduce database connections:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: pgbouncer
  namespace: aurora
spec:
  replicas: 2
  template:
    spec:
      containers:
        - name: pgbouncer
          image: edoburu/pgbouncer:latest
          env:
            - name: DATABASE_URL
              value: "postgresql://aurora:password@aurora-oss-postgres:5432/aurora_db"
            - name: POOL_MODE
              value: "transaction"
            - name: MAX_CLIENT_CONN
              value: "1000"
            - name: DEFAULT_POOL_SIZE
              value: "25"
Update application:
config:
  POSTGRES_HOST: "pgbouncer"

Redis Scaling

Redis Cluster

For high availability and sharding:
services:
  redis:
    enabled: false  # Disable built-in Redis

config:
  REDIS_URL: "redis://redis-cluster:6379/0"
Deploy Redis Cluster:
helm install redis bitnami/redis-cluster \
  --namespace aurora \
  --set cluster.nodes=6 \
  --set cluster.replicas=1

Redis Sentinel

For failover without sharding:
helm install redis bitnami/redis \
  --namespace aurora \
  --set sentinel.enabled=true \
  --set master.persistence.size=20Gi \
  --set replica.replicaCount=2

Vector Database Scaling

Weaviate Clustering

For production, use Weaviate Cloud or multi-node cluster: Weaviate Cloud (Recommended):
services:
  weaviate:
    enabled: false

config:
  WEAVIATE_HOST: "aurora-cluster.weaviate.network"
  WEAVIATE_PORT: "443"
  WEAVIATE_SCHEME: "https"
Self-Managed Cluster:
replicaCounts:
  weaviate: 3

weaviate:
  cluster:
    enabled: true
    replicas: 3

Load Balancing

Ingress Session Affinity

For WebSocket connections, enable session affinity:
ingress:
  annotations:
    nginx.ingress.kubernetes.io/affinity: "cookie"
    nginx.ingress.kubernetes.io/session-cookie-name: "aurora-ws-affinity"
    nginx.ingress.kubernetes.io/session-cookie-max-age: "3600"

External Load Balancer

For cloud deployments: AWS ALB:
ingress:
  className: "alb"
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/healthcheck-path: /health
GCP Load Balancer:
ingress:
  className: "gce"
  annotations:
    kubernetes.io/ingress.class: "gce"
    kubernetes.io/ingress.global-static-ip-name: "aurora-ip"

Monitoring Scaling

Key Metrics to Track

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
data:
  prometheus.yml: |
    scrape_configs:
      - job_name: 'aurora-metrics'
        metrics_path: '/metrics'
        kubernetes_sd_configs:
          - role: pod
            namespaces:
              names:
                - aurora
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_label_app]
            action: keep
            regex: aurora-.*
Metrics to monitor:
  • Request rate (requests/sec)
  • Response time (p50, p95, p99)
  • Error rate (%)
  • CPU usage (%)
  • Memory usage (%)
  • Celery queue length
  • Database connections
  • Redis memory usage

Grafana Dashboard

Import Aurora dashboard:
kubectl create configmap grafana-dashboard-aurora \
  --from-file=aurora-dashboard.json \
  -n monitoring

Performance Optimization

Caching

Enable aggressive caching:
config:
  # Cloud provider API caching
  AURORA_SETUP_CACHE_ENABLED: "true"
  AURORA_SETUP_CACHE_TTL: "7200"  # 2 hours
  
  # Storage caching
  STORAGE_CACHE_ENABLED: "true"
  STORAGE_CACHE_TTL: "300"  # 5 minutes

Cost Optimization

Reduce LLM costs during scale:
config:
  RCA_OPTIMIZE_COSTS: "true"  # Use cheaper models when possible
  AGENT_RECURSION_LIMIT: "120"  # Reduce from 240 for faster completion

Testing Scaling

Load Testing

Use k6 for load testing:
// load-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';

export let options = {
  stages: [
    { duration: '2m', target: 10 },   // Ramp up to 10 users
    { duration: '5m', target: 50 },   // Ramp up to 50 users
    { duration: '10m', target: 100 }, // Stay at 100 users
    { duration: '2m', target: 0 },    // Ramp down
  ],
};

export default function () {
  const res = http.get('https://api.aurora.example.com/health');
  check(res, {
    'status is 200': (r) => r.status === 200,
    'response time < 500ms': (r) => r.timings.duration < 500,
  });
  sleep(1);
}
Run test:
k6 run load-test.js

Stress Testing

export let options = {
  stages: [
    { duration: '5m', target: 1000 },  // Ramp to 1000 users
    { duration: '10m', target: 1000 }, // Stay at peak
  ],
};

Scaling Checklist

Before scaling to production:
  • Resource requests and limits configured
  • HPA configured for stateless services
  • Database connection pooling enabled
  • Redis clustering or Sentinel configured
  • Weaviate clustering or cloud instance
  • Session affinity enabled for WebSocket
  • Monitoring and alerting set up
  • Load testing performed
  • Cost optimization settings reviewed
  • Backup strategy scales with data growth
  • Log aggregation handles increased volume
  • Network policies allow pod-to-pod communication

Common Scaling Issues

Pod OOMKilled

Increase memory limits:
resources:
  celeryWorker:
    limits:
      memory: "16Gi"  # Increased from 8Gi

Database Connection Exhaustion

Add PgBouncer or increase connection limits:
postgres:
  config:
    max_connections: "500"  # Increased from 100

Redis Memory Issues

Increase Redis memory or add eviction policy:
redis:
  config:
    maxmemory: "2gb"
    maxmemory-policy: "allkeys-lru"

Slow Response Times

Profile application:
# Enable profiling
kubectl exec -it deployment/aurora-oss-server -n aurora -- \
  python -m cProfile -s cumtime main_compute.py

Next Steps

Production Best Practices

Security and reliability for production

Monitoring

Set up comprehensive monitoring

Performance Tuning

Optimize Aurora performance

Troubleshooting

Common scaling issues

Build docs developers (and LLMs) love