Scaling Aurora

Guidelines for scaling Aurora to handle increased load, traffic, and concurrent users.

Scaling Strategy

Aurora components fall into three categories:

Stateless Services (Horizontally Scalable)

These services can be scaled by increasing replica count:

aurora-server - REST API (handles HTTP requests)
celery-worker - Background tasks (RCA analysis, integrations)
chatbot - WebSocket server (chat interface)
frontend - Next.js UI (serves web pages)
searxng - Web search engine
t2v-transformers - ML embeddings

Stateful Services (Requires Special Configuration)

These services require additional setup for horizontal scaling:

postgres - Database (replication, read replicas)
redis - Cache and queue (Redis Cluster)
weaviate - Vector database (multi-node cluster)
vault - Secrets management (Raft HA)

Single-Instance Services

These MUST remain at 1 replica:

celery-beat - Task scheduler (multiple instances cause duplicate tasks)

Horizontal Scaling

Kubernetes (Helm)

Increase replica counts in values.generated.yaml:

replicaCounts:
  # Scale based on traffic
  server: 5          # API requests
  celeryWorker: 10   # Background tasks
  chatbot: 3         # WebSocket connections
  frontend: 3        # Web traffic
  
  # Scale for performance
  searxng: 2         # Web search
  transformers: 2    # ML embeddings
  
  # Keep at 1
  celeryBeat: 1      # DO NOT SCALE

Apply changes:

helm upgrade aurora-oss ./deploy/helm/aurora \
  --namespace aurora \
  -f values.generated.yaml

Docker Compose

Scale services manually:

# Scale specific service
docker compose up -d --scale celery_worker=5

# Scale multiple services
docker compose up -d \
  --scale celery_worker=5 \
  --scale aurora-server=3

Or edit docker-compose.yaml:

services:
  celery_worker:
    # ... existing config ...
    deploy:
      replicas: 5

Auto-Scaling

Horizontal Pod Autoscaler (HPA)

Automatically scale based on CPU/memory:

Enable metrics server

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Create HPA for API server

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: aurora-server-hpa
  namespace: aurora
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: aurora-oss-server
  minReplicas: 3
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
        - type: Percent
          value: 50
          periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Pods
          value: 1
          periodSeconds: 120

Apply:

kubectl apply -f aurora-server-hpa.yaml

Create HPA for Celery workers

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: aurora-celery-worker-hpa
  namespace: aurora
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: aurora-oss-celery-worker
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 80

Monitor autoscaling

# Check HPA status
kubectl get hpa -n aurora

# Watch scaling events
kubectl get hpa -n aurora -w

# View scaling events
kubectl describe hpa aurora-server-hpa -n aurora

Custom Metrics Autoscaling

Scale based on application metrics:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: aurora-celery-queue-hpa
  namespace: aurora
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: aurora-oss-celery-worker
  minReplicas: 2
  maxReplicas: 50
  metrics:
    # Scale based on Celery queue length
    - type: External
      external:
        metric:
          name: redis_celery_queue_length
          selector:
            matchLabels:
              queue: celery
        target:
          type: AverageValue
          averageValue: "10"  # 10 tasks per worker

Vertical Scaling

Increase Resource Limits

For Kubernetes, update values.generated.yaml:

resources:
  server:
    requests:
      cpu: "1000m"      # Increased from 500m
      memory: "2Gi"     # Increased from 1Gi
    limits:
      cpu: "4000m"      # Increased from 2000m
      memory: "8Gi"     # Increased from 4Gi
  
  celeryWorker:
    requests:
      cpu: "500m"       # Increased from 200m
      memory: "4Gi"     # Increased from 2Gi
    limits:
      cpu: "2000m"      # Increased from 1000m
      memory: "16Gi"    # Increased from 8Gi

Apply changes:

helm upgrade aurora-oss ./deploy/helm/aurora \
  --namespace aurora \
  -f values.generated.yaml

Vertical Pod Autoscaler (VPA)

Automatically adjust resource requests:

Install VPA

git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh

Create VPA for API server

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: aurora-server-vpa
  namespace: aurora
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: aurora-oss-server
  updatePolicy:
    updateMode: "Auto"  # or "Recreate" or "Initial"
  resourcePolicy:
    containerPolicies:
      - containerName: '*'
        minAllowed:
          cpu: 100m
          memory: 256Mi
        maxAllowed:
          cpu: 4000m
          memory: 8Gi

Monitor recommendations

kubectl get vpa aurora-server-vpa -n aurora
kubectl describe vpa aurora-server-vpa -n aurora

Database Scaling

PostgreSQL

Read Replicas

For read-heavy workloads: Managed Database (Recommended):

AWS RDS: Create read replicas via console
GCP Cloud SQL: Enable read replicas
Azure Database: Add read replicas

Configure application:

config:
  POSTGRES_HOST: "aurora-primary.xyz.rds.amazonaws.com"
  POSTGRES_READ_REPLICA_HOST: "aurora-replica.xyz.rds.amazonaws.com"

Self-Managed:

# Deploy read replica
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres-replica
spec:
  serviceName: postgres-replica
  replicas: 2
  template:
    spec:
      containers:
        - name: postgres
          image: postgres:15-alpine
          env:
            - name: POSTGRES_PRIMARY_HOST
              value: "aurora-oss-postgres-0.aurora-oss-postgres"
            - name: POSTGRES_REPLICATION_MODE
              value: "slave"

Connection Pooling

Use PgBouncer to reduce database connections:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: pgbouncer
  namespace: aurora
spec:
  replicas: 2
  template:
    spec:
      containers:
        - name: pgbouncer
          image: edoburu/pgbouncer:latest
          env:
            - name: DATABASE_URL
              value: "postgresql://aurora:password@aurora-oss-postgres:5432/aurora_db"
            - name: POOL_MODE
              value: "transaction"
            - name: MAX_CLIENT_CONN
              value: "1000"
            - name: DEFAULT_POOL_SIZE
              value: "25"

Update application:

config:
  POSTGRES_HOST: "pgbouncer"

Redis Scaling

Redis Cluster

For high availability and sharding:

services:
  redis:
    enabled: false  # Disable built-in Redis

config:
  REDIS_URL: "redis://redis-cluster:6379/0"

Deploy Redis Cluster:

helm install redis bitnami/redis-cluster \
  --namespace aurora \
  --set cluster.nodes=6 \
  --set cluster.replicas=1

Redis Sentinel

For failover without sharding:

helm install redis bitnami/redis \
  --namespace aurora \
  --set sentinel.enabled=true \
  --set master.persistence.size=20Gi \
  --set replica.replicaCount=2

Vector Database Scaling

Weaviate Clustering

For production, use Weaviate Cloud or multi-node cluster: Weaviate Cloud (Recommended):

services:
  weaviate:
    enabled: false

config:
  WEAVIATE_HOST: "aurora-cluster.weaviate.network"
  WEAVIATE_PORT: "443"
  WEAVIATE_SCHEME: "https"

Self-Managed Cluster:

replicaCounts:
  weaviate: 3

weaviate:
  cluster:
    enabled: true
    replicas: 3

Load Balancing

Ingress Session Affinity

For WebSocket connections, enable session affinity:

ingress:
  annotations:
    nginx.ingress.kubernetes.io/affinity: "cookie"
    nginx.ingress.kubernetes.io/session-cookie-name: "aurora-ws-affinity"
    nginx.ingress.kubernetes.io/session-cookie-max-age: "3600"

External Load Balancer

For cloud deployments: AWS ALB:

ingress:
  className: "alb"
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/healthcheck-path: /health

GCP Load Balancer:

ingress:
  className: "gce"
  annotations:
    kubernetes.io/ingress.class: "gce"
    kubernetes.io/ingress.global-static-ip-name: "aurora-ip"

Monitoring Scaling

Key Metrics to Track

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
data:
  prometheus.yml: |
    scrape_configs:
      - job_name: 'aurora-metrics'
        metrics_path: '/metrics'
        kubernetes_sd_configs:
          - role: pod
            namespaces:
              names:
                - aurora
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_label_app]
            action: keep
            regex: aurora-.*

Metrics to monitor:

Request rate (requests/sec)
Response time (p50, p95, p99)
Error rate (%)
CPU usage (%)
Memory usage (%)
Celery queue length
Database connections
Redis memory usage

Grafana Dashboard

Import Aurora dashboard:

kubectl create configmap grafana-dashboard-aurora \
  --from-file=aurora-dashboard.json \
  -n monitoring

Performance Optimization

Caching

Enable aggressive caching:

config:
  # Cloud provider API caching
  AURORA_SETUP_CACHE_ENABLED: "true"
  AURORA_SETUP_CACHE_TTL: "7200"  # 2 hours
  
  # Storage caching
  STORAGE_CACHE_ENABLED: "true"
  STORAGE_CACHE_TTL: "300"  # 5 minutes

Cost Optimization

Reduce LLM costs during scale:

config:
  RCA_OPTIMIZE_COSTS: "true"  # Use cheaper models when possible
  AGENT_RECURSION_LIMIT: "120"  # Reduce from 240 for faster completion

Testing Scaling

Load Testing

Use k6 for load testing:

// load-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';

export let options = {
  stages: [
    { duration: '2m', target: 10 },   // Ramp up to 10 users
    { duration: '5m', target: 50 },   // Ramp up to 50 users
    { duration: '10m', target: 100 }, // Stay at 100 users
    { duration: '2m', target: 0 },    // Ramp down
  ],
};

export default function () {
  const res = http.get('https://api.aurora.example.com/health');
  check(res, {
    'status is 200': (r) => r.status === 200,
    'response time < 500ms': (r) => r.timings.duration < 500,
  });
  sleep(1);
}

Run test:

k6 run load-test.js

Stress Testing

export let options = {
  stages: [
    { duration: '5m', target: 1000 },  // Ramp to 1000 users
    { duration: '10m', target: 1000 }, // Stay at peak
  ],
};

Scaling Checklist

Before scaling to production:

Common Scaling Issues

Pod OOMKilled

Increase memory limits:

resources:
  celeryWorker:
    limits:
      memory: "16Gi"  # Increased from 8Gi

Database Connection Exhaustion

Add PgBouncer or increase connection limits:

postgres:
  config:
    max_connections: "500"  # Increased from 100

Redis Memory Issues

Increase Redis memory or add eviction policy:

redis:
  config:
    maxmemory: "2gb"
    maxmemory-policy: "allkeys-lru"

Slow Response Times

Profile application:

# Enable profiling
kubectl exec -it deployment/aurora-oss-server -n aurora -- \
  python -m cProfile -s cumtime main_compute.py

Next Steps

Production Best Practices

Security and reliability for production

Monitoring

Set up comprehensive monitoring

Performance Tuning

Optimize Aurora performance

Troubleshooting

Common scaling issues

Get Started

Core Features

Architecture

Deployment

Configuration

Integrations

Cloud Providers

Observability

Development

Guides

Reference

Help

​Scaling Aurora

​Scaling Strategy

​Stateless Services (Horizontally Scalable)

​Stateful Services (Requires Special Configuration)

​Single-Instance Services

​Horizontal Scaling

​Kubernetes (Helm)

​Docker Compose

​Auto-Scaling

​Horizontal Pod Autoscaler (HPA)

​Custom Metrics Autoscaling

​Vertical Scaling

​Increase Resource Limits

​Vertical Pod Autoscaler (VPA)

​Database Scaling

​PostgreSQL

​Read Replicas

​Connection Pooling

​Redis Scaling

​Redis Cluster

​Redis Sentinel

​Vector Database Scaling

​Weaviate Clustering

​Load Balancing

​Ingress Session Affinity

​External Load Balancer

​Monitoring Scaling

​Key Metrics to Track

​Grafana Dashboard

​Performance Optimization

​Caching

​Cost Optimization

​Testing Scaling

​Load Testing

​Stress Testing

​Scaling Checklist

​Common Scaling Issues

​Pod OOMKilled

​Database Connection Exhaustion

​Redis Memory Issues

​Slow Response Times

​Next Steps