Skip to main content

Kubernetes Deployment

Deploy Aurora on Kubernetes clusters using Helm for production-grade, scalable deployments.

Prerequisites

  • Kubernetes cluster 1.19 or newer
  • Helm 3.x installed
  • kubectl configured
  • 16GB RAM across nodes (minimum)
  • 100GB persistent storage
  • S3-compatible object storage (AWS S3, MinIO, Cloudflare R2, etc.)

Architecture Overview

Aurora on Kubernetes consists of: Application Services:
  • aurora-server - Flask REST API (scalable)
  • aurora-chatbot - WebSocket service (scalable)
  • aurora-frontend - Next.js UI (scalable)
  • celery-worker - Background tasks (scalable)
  • celery-beat - Task scheduler (single instance)
Stateful Services:
  • postgres - PostgreSQL database
  • redis - Task queue and cache
  • weaviate - Vector database
  • vault - Secrets management
Supporting Services:
  • searxng - Web search engine
  • t2v-transformers - ML embeddings

Quick Start

1

Prepare configuration

Copy the Helm values template:
cd deploy/helm/aurora
cp values.yaml values.generated.yaml
DO NOT commit values.generated.yaml - it contains secrets! Add to .gitignore if not already present.
2

Configure required settings

Edit values.generated.yaml and set:
# Container registry (REQUIRED)
image:
  registry: "ghcr.io/your-org"  # or docker.io, gcr.io, etc.
  tag: "latest"

# Object Storage (REQUIRED)
config:
  STORAGE_BUCKET: "aurora-production"
  STORAGE_ENDPOINT_URL: "https://s3.amazonaws.com"
  STORAGE_REGION: "us-east-1"
  
  # Public URLs (REQUIRED)
  NEXT_PUBLIC_BACKEND_URL: "https://api.aurora.example.com"
  NEXT_PUBLIC_WEBSOCKET_URL: "wss://ws.aurora.example.com"
  FRONTEND_URL: "https://aurora.example.com"

# Secrets (REQUIRED)
secrets:
  db:
    POSTGRES_PASSWORD: "<generate-with-openssl>"
  
  backend:
    VAULT_TOKEN: "<set-after-vault-init>"
    STORAGE_ACCESS_KEY: "<your-s3-key>"
    STORAGE_SECRET_KEY: "<your-s3-secret>"
  
  app:
    FLASK_SECRET_KEY: "<generate-with-openssl>"
    AUTH_SECRET: "<generate-with-openssl>"
    SEARXNG_SECRET: "<generate-with-openssl>"
  
  llm:
    OPENROUTER_API_KEY: "sk-or-v1-..."

# Ingress
ingress:
  enabled: true
  className: "nginx"
  hosts:
    frontend: "aurora.example.com"
    api: "api.aurora.example.com"
    ws: "ws.aurora.example.com"
Generate secrets:
openssl rand -base64 32  # Run 3x for passwords/secrets
3

Build and push images

Build images with your registry:
make deploy-build
This command:
  • Reads values.generated.yaml for registry configuration
  • Builds images with git SHA tag (e.g., abc123f)
  • Pushes to your container registry
  • Updates image.tag in values.generated.yaml
Requires Docker Buildx and authentication to your container registry.Login examples:
# Docker Hub
docker login

# GitHub Container Registry
echo $GITHUB_TOKEN | docker login ghcr.io -u USERNAME --password-stdin

# Google Container Registry
gcloud auth configure-docker

# AWS ECR
aws ecr get-login-password --region us-east-1 | \
  docker login --username AWS --password-stdin 123456789.dkr.ecr.us-east-1.amazonaws.com
4

Deploy with Helm

helm upgrade --install aurora-oss ./deploy/helm/aurora \
  --namespace aurora --create-namespace \
  --reset-values \
  -f deploy/helm/aurora/values.generated.yaml
Or use the Makefile:
make deploy
5

Initialize Vault

On first deployment, initialize Vault:
kubectl exec -it -n aurora statefulset/aurora-oss-vault -- vault operator init
Save the unseal keys and root token securely!Update values.generated.yaml:
secrets:
  backend:
    VAULT_TOKEN: "hvs.CAESI..."
Redeploy to apply token:
helm upgrade aurora-oss ./deploy/helm/aurora \
  --namespace aurora \
  -f deploy/helm/aurora/values.generated.yaml
6

Verify deployment

# Check pod status
kubectl get pods -n aurora

# Check services
kubectl get svc -n aurora

# Check ingress
kubectl get ingress -n aurora

# View logs
kubectl logs -n aurora deployment/aurora-oss-server --tail=50

Configuration

Replica Counts

Scale application services:
replicaCounts:
  server: 3          # API servers (scalable)
  celeryWorker: 5    # Background workers (scalable)
  chatbot: 2         # WebSocket servers (scalable)
  frontend: 2        # Next.js frontend (scalable)
  searxng: 1         # Search engine (scalable)
  transformers: 2    # ML service (scalable)
  
  # Single instance only:
  celeryBeat: 1      # Task scheduler (DO NOT scale)
  redis: 1           # Cache (clustering requires config)
  weaviate: 1        # Vector DB (clustering requires config)
  vault: 1           # Secrets (HA requires config)
  postgres: 1        # Database (replication requires config)

External Services

Use managed services instead of in-cluster deployments:
services:
  postgres:
    enabled: false    # Use RDS, Cloud SQL, etc.
  redis:
    enabled: false    # Use ElastiCache, Memorystore, etc.
  weaviate:
    enabled: false    # Use Weaviate Cloud
  vault:
    enabled: false    # Use external Vault or cloud KMS

config:
  # Configure external endpoints
  POSTGRES_HOST: "aurora-db.xyz.us-east-1.rds.amazonaws.com"
  REDIS_URL: "redis://aurora-cache.abc.0001.use1.cache.amazonaws.com:6379/0"
  WEAVIATE_HOST: "aurora-cluster.weaviate.network"
  VAULT_ADDR: "https://vault.company.com:8200"

Resource Limits

Adjust based on workload:
resources:
  server:
    requests:
      cpu: "500m"
      memory: "1Gi"
    limits:
      cpu: "2000m"
      memory: "4Gi"
  
  celeryWorker:
    requests:
      cpu: "200m"
      memory: "2Gi"
    limits:
      cpu: "1000m"
      memory: "8Gi"

Persistent Storage

persistence:
  postgres:
    size: 100Gi
    # Optional: use specific storage class
    # storageClassName: "fast-ssd"
  
  weaviate:
    size: 100Gi
  
  vault:
    size: 20Gi
  
  redis:
    size: 10Gi

Ingress Configuration

Aurora uses subdomain-based routing:
ingress:
  enabled: true
  className: "nginx"  # or "traefik", "alb", "gce"
  
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    external-dns.alpha.kubernetes.io/hostname: "aurora.example.com"
  
  # TLS with cert-manager
  tls:
    enabled: true
    certManager:
      enabled: true
      issuer: "letsencrypt-prod"
      email: "[email protected]"
  
  hosts:
    frontend: "aurora.example.com"
    api: "api.aurora.example.com"
    ws: "ws.aurora.example.com"
Important Ingress Settings:For WebSocket and long-running requests:
  • proxy-read-timeout: 3600s - RCA analysis can take 30+ minutes
  • proxy-http-version: 1.1 - Required for WebSocket upgrade
  • proxy-body-size: 50m - For file uploads
These are auto-configured for nginx. For other controllers (Traefik, ALB, GCE), configure equivalent settings via ingress.annotations.

Pod Isolation

Enable isolated terminal pods for untrusted code execution:
config:
  ENABLE_POD_ISOLATION: "true"
  TERMINAL_NAMESPACE: "untrusted"
  TERMINAL_IMAGE: "ghcr.io/your-org/aurora-terminal:abc123f"
  TERMINAL_POD_TTL: "3600"
  TERMINAL_RUNTIME_CLASS: "gvisor"  # Optional: gvisor, kata
The chart automatically creates:
  • Isolated namespace
  • RBAC for pod management
  • NetworkPolicy blocking cluster access
For additional hardening:
# Create RuntimeClass for gVisor
kubectl apply -f - <<EOF
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: gvisor
handler: runsc
EOF

# Taint nodes for untrusted workloads
kubectl label node worker-1 workload=untrusted
kubectl taint node worker-1 workload=untrusted:NoSchedule
Then enable:
config:
  TERMINAL_RUNTIME_CLASS: "gvisor"
  USE_UNTRUSTED_NODES: "true"

Helm Commands

Install

helm install aurora-oss ./deploy/helm/aurora \
  --namespace aurora --create-namespace \
  -f values.generated.yaml

Upgrade

helm upgrade aurora-oss ./deploy/helm/aurora \
  --namespace aurora \
  -f values.generated.yaml

Rollback

# List revisions
helm history aurora-oss -n aurora

# Rollback to previous
helm rollback aurora-oss -n aurora

# Rollback to specific revision
helm rollback aurora-oss 3 -n aurora

Uninstall

helm uninstall aurora-oss -n aurora

# Delete namespace (removes PVCs)
kubectl delete namespace aurora

Troubleshooting

Pods Not Starting

Check pod status:
kubectl get pods -n aurora
kubectl describe pod <pod-name> -n aurora
Common issues:
  • Image pull errors (check registry authentication)
  • Resource limits (insufficient CPU/memory)
  • PVC binding issues (check storage class)

Database Connection Errors

Check Postgres:
kubectl logs -n aurora statefulset/aurora-oss-postgres --tail=50

# Test connection
kubectl exec -it -n aurora statefulset/aurora-oss-postgres -- \
  psql -U aurora -d aurora_db -c "SELECT 1;"

Vault Issues

Check Vault status:
kubectl exec -it -n aurora statefulset/aurora-oss-vault -- vault status
If sealed:
# Unseal (repeat with 3 different keys)
kubectl exec -it -n aurora statefulset/aurora-oss-vault -- \
  vault operator unseal <unseal-key-1>

Ingress Not Working

Check ingress:
kubectl get ingress -n aurora
kubectl describe ingress aurora-oss -n aurora
Verify DNS:
nslookup aurora.example.com
Test internal access:
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- \
  curl http://aurora-oss-server:5080/health

View Logs

# All pods
kubectl logs -n aurora -l app.kubernetes.io/name=aurora --tail=100

# Specific service
kubectl logs -n aurora deployment/aurora-oss-server -f
kubectl logs -n aurora deployment/aurora-oss-celery-worker -f
kubectl logs -n aurora deployment/aurora-oss-chatbot -f

kubectl Agent

Connect Aurora to other Kubernetes clusters using the kubectl agent. The kubectl agent is integrated into Aurora’s architecture for executing Kubernetes commands across clusters. Quick example:
# In the cluster you want to connect
helm install aurora-kubectl-agent ./kubectl-agent/chart \
  --namespace aurora --create-namespace \
  --set aurora.backendUrl="https://api.aurora.example.com" \
  --set aurora.wsEndpoint="wss://ws.aurora.example.com/kubectl-agent" \
  --set aurora.agentToken="<token-from-aurora-ui>" \
  --set agent.image.repository="your-registry/aurora-kubectl-agent" \
  --set agent.image.tag="1.0.3"

Next Steps

Production Best Practices

Security, monitoring, and reliability

Scaling Guide

Scale Aurora for high availability

Backup & Recovery

Protect your data

Monitoring

Set up observability

Build docs developers (and LLMs) love