Kubernetes Deployment

Deploy Aurora on Kubernetes clusters using Helm for production-grade, scalable deployments.

Prerequisites

Kubernetes cluster 1.19 or newer
Helm 3.x installed
kubectl configured
16GB RAM across nodes (minimum)
100GB persistent storage
S3-compatible object storage (AWS S3, MinIO, Cloudflare R2, etc.)

Architecture Overview

Aurora on Kubernetes consists of: Application Services:

aurora-server - Flask REST API (scalable)
aurora-chatbot - WebSocket service (scalable)
aurora-frontend - Next.js UI (scalable)
celery-worker - Background tasks (scalable)
celery-beat - Task scheduler (single instance)

Stateful Services:

postgres - PostgreSQL database
redis - Task queue and cache
weaviate - Vector database
vault - Secrets management

Supporting Services:

searxng - Web search engine
t2v-transformers - ML embeddings

Quick Start

Prepare configuration

Copy the Helm values template:

cd deploy/helm/aurora
cp values.yaml values.generated.yaml

DO NOT commit values.generated.yaml - it contains secrets! Add to .gitignore if not already present.

Configure required settings

Edit values.generated.yaml and set:

# Container registry (REQUIRED)
image:
  registry: "ghcr.io/your-org"  # or docker.io, gcr.io, etc.
  tag: "latest"

# Object Storage (REQUIRED)
config:
  STORAGE_BUCKET: "aurora-production"
  STORAGE_ENDPOINT_URL: "https://s3.amazonaws.com"
  STORAGE_REGION: "us-east-1"
  
  # Public URLs (REQUIRED)
  NEXT_PUBLIC_BACKEND_URL: "https://api.aurora.example.com"
  NEXT_PUBLIC_WEBSOCKET_URL: "wss://ws.aurora.example.com"
  FRONTEND_URL: "https://aurora.example.com"

# Secrets (REQUIRED)
secrets:
  db:
    POSTGRES_PASSWORD: "<generate-with-openssl>"
  
  backend:
    VAULT_TOKEN: "<set-after-vault-init>"
    STORAGE_ACCESS_KEY: "<your-s3-key>"
    STORAGE_SECRET_KEY: "<your-s3-secret>"
  
  app:
    FLASK_SECRET_KEY: "<generate-with-openssl>"
    AUTH_SECRET: "<generate-with-openssl>"
    SEARXNG_SECRET: "<generate-with-openssl>"
  
  llm:
    OPENROUTER_API_KEY: "sk-or-v1-..."

# Ingress
ingress:
  enabled: true
  className: "nginx"
  hosts:
    frontend: "aurora.example.com"
    api: "api.aurora.example.com"
    ws: "ws.aurora.example.com"

Generate secrets:

openssl rand -base64 32  # Run 3x for passwords/secrets

Build and push images

Build images with your registry:

make deploy-build

This command:

Reads values.generated.yaml for registry configuration
Builds images with git SHA tag (e.g., abc123f)
Pushes to your container registry
Updates image.tag in values.generated.yaml

Requires Docker Buildx and authentication to your container registry.Login examples:

# Docker Hub
docker login

# GitHub Container Registry
echo $GITHUB_TOKEN | docker login ghcr.io -u USERNAME --password-stdin

# Google Container Registry
gcloud auth configure-docker

# AWS ECR
aws ecr get-login-password --region us-east-1 | \
  docker login --username AWS --password-stdin 123456789.dkr.ecr.us-east-1.amazonaws.com

Deploy with Helm

helm upgrade --install aurora-oss ./deploy/helm/aurora \
  --namespace aurora --create-namespace \
  --reset-values \
  -f deploy/helm/aurora/values.generated.yaml

Or use the Makefile:

make deploy

Initialize Vault

On first deployment, initialize Vault:

kubectl exec -it -n aurora statefulset/aurora-oss-vault -- vault operator init

Save the unseal keys and root token securely!Update values.generated.yaml:

secrets:
  backend:
    VAULT_TOKEN: "hvs.CAESI..."

Redeploy to apply token:

helm upgrade aurora-oss ./deploy/helm/aurora \
  --namespace aurora \
  -f deploy/helm/aurora/values.generated.yaml

Verify deployment

# Check pod status
kubectl get pods -n aurora

# Check services
kubectl get svc -n aurora

# Check ingress
kubectl get ingress -n aurora

# View logs
kubectl logs -n aurora deployment/aurora-oss-server --tail=50

Configuration

Replica Counts

Scale application services:

replicaCounts:
  server: 3          # API servers (scalable)
  celeryWorker: 5    # Background workers (scalable)
  chatbot: 2         # WebSocket servers (scalable)
  frontend: 2        # Next.js frontend (scalable)
  searxng: 1         # Search engine (scalable)
  transformers: 2    # ML service (scalable)
  
  # Single instance only:
  celeryBeat: 1      # Task scheduler (DO NOT scale)
  redis: 1           # Cache (clustering requires config)
  weaviate: 1        # Vector DB (clustering requires config)
  vault: 1           # Secrets (HA requires config)
  postgres: 1        # Database (replication requires config)

External Services

Use managed services instead of in-cluster deployments:

services:
  postgres:
    enabled: false    # Use RDS, Cloud SQL, etc.
  redis:
    enabled: false    # Use ElastiCache, Memorystore, etc.
  weaviate:
    enabled: false    # Use Weaviate Cloud
  vault:
    enabled: false    # Use external Vault or cloud KMS

config:
  # Configure external endpoints
  POSTGRES_HOST: "aurora-db.xyz.us-east-1.rds.amazonaws.com"
  REDIS_URL: "redis://aurora-cache.abc.0001.use1.cache.amazonaws.com:6379/0"
  WEAVIATE_HOST: "aurora-cluster.weaviate.network"
  VAULT_ADDR: "https://vault.company.com:8200"

Resource Limits

Adjust based on workload:

resources:
  server:
    requests:
      cpu: "500m"
      memory: "1Gi"
    limits:
      cpu: "2000m"
      memory: "4Gi"
  
  celeryWorker:
    requests:
      cpu: "200m"
      memory: "2Gi"
    limits:
      cpu: "1000m"
      memory: "8Gi"

Persistent Storage

persistence:
  postgres:
    size: 100Gi
    # Optional: use specific storage class
    # storageClassName: "fast-ssd"
  
  weaviate:
    size: 100Gi
  
  vault:
    size: 20Gi
  
  redis:
    size: 10Gi

Ingress Configuration

Aurora uses subdomain-based routing:

ingress:
  enabled: true
  className: "nginx"  # or "traefik", "alb", "gce"
  
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    external-dns.alpha.kubernetes.io/hostname: "aurora.example.com"
  
  # TLS with cert-manager
  tls:
    enabled: true
    certManager:
      enabled: true
      issuer: "letsencrypt-prod"
      email: "[email protected]"
  
  hosts:
    frontend: "aurora.example.com"
    api: "api.aurora.example.com"
    ws: "ws.aurora.example.com"

Important Ingress Settings:For WebSocket and long-running requests:

proxy-read-timeout: 3600s - RCA analysis can take 30+ minutes
proxy-http-version: 1.1 - Required for WebSocket upgrade
proxy-body-size: 50m - For file uploads

These are auto-configured for nginx. For other controllers (Traefik, ALB, GCE), configure equivalent settings via ingress.annotations.

Pod Isolation

Enable isolated terminal pods for untrusted code execution:

config:
  ENABLE_POD_ISOLATION: "true"
  TERMINAL_NAMESPACE: "untrusted"
  TERMINAL_IMAGE: "ghcr.io/your-org/aurora-terminal:abc123f"
  TERMINAL_POD_TTL: "3600"
  TERMINAL_RUNTIME_CLASS: "gvisor"  # Optional: gvisor, kata

The chart automatically creates:

Isolated namespace
RBAC for pod management
NetworkPolicy blocking cluster access

For additional hardening:

# Create RuntimeClass for gVisor
kubectl apply -f - <<EOF
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: gvisor
handler: runsc
EOF

# Taint nodes for untrusted workloads
kubectl label node worker-1 workload=untrusted
kubectl taint node worker-1 workload=untrusted:NoSchedule

Then enable:

config:
  TERMINAL_RUNTIME_CLASS: "gvisor"
  USE_UNTRUSTED_NODES: "true"

Helm Commands

Install

helm install aurora-oss ./deploy/helm/aurora \
  --namespace aurora --create-namespace \
  -f values.generated.yaml

Upgrade

helm upgrade aurora-oss ./deploy/helm/aurora \
  --namespace aurora \
  -f values.generated.yaml

Rollback

# List revisions
helm history aurora-oss -n aurora

# Rollback to previous
helm rollback aurora-oss -n aurora

# Rollback to specific revision
helm rollback aurora-oss 3 -n aurora

Uninstall

helm uninstall aurora-oss -n aurora

# Delete namespace (removes PVCs)
kubectl delete namespace aurora

Troubleshooting

Pods Not Starting

Check pod status:

kubectl get pods -n aurora
kubectl describe pod <pod-name> -n aurora

Common issues:

Image pull errors (check registry authentication)
Resource limits (insufficient CPU/memory)
PVC binding issues (check storage class)

Database Connection Errors

Check Postgres:

kubectl logs -n aurora statefulset/aurora-oss-postgres --tail=50

# Test connection
kubectl exec -it -n aurora statefulset/aurora-oss-postgres -- \
  psql -U aurora -d aurora_db -c "SELECT 1;"

Vault Issues

Check Vault status:

kubectl exec -it -n aurora statefulset/aurora-oss-vault -- vault status

If sealed:

# Unseal (repeat with 3 different keys)
kubectl exec -it -n aurora statefulset/aurora-oss-vault -- \
  vault operator unseal <unseal-key-1>

Ingress Not Working

Check ingress:

kubectl get ingress -n aurora
kubectl describe ingress aurora-oss -n aurora

Verify DNS:

nslookup aurora.example.com

Test internal access:

kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- \
  curl http://aurora-oss-server:5080/health

View Logs

# All pods
kubectl logs -n aurora -l app.kubernetes.io/name=aurora --tail=100

# Specific service
kubectl logs -n aurora deployment/aurora-oss-server -f
kubectl logs -n aurora deployment/aurora-oss-celery-worker -f
kubectl logs -n aurora deployment/aurora-oss-chatbot -f

kubectl Agent

Connect Aurora to other Kubernetes clusters using the kubectl agent. The kubectl agent is integrated into Aurora’s architecture for executing Kubernetes commands across clusters. Quick example:

# In the cluster you want to connect
helm install aurora-kubectl-agent ./kubectl-agent/chart \
  --namespace aurora --create-namespace \
  --set aurora.backendUrl="https://api.aurora.example.com" \
  --set aurora.wsEndpoint="wss://ws.aurora.example.com/kubectl-agent" \
  --set aurora.agentToken="<token-from-aurora-ui>" \
  --set agent.image.repository="your-registry/aurora-kubectl-agent" \
  --set agent.image.tag="1.0.3"

Next Steps

Production Best Practices

Security, monitoring, and reliability

Scaling Guide

Scale Aurora for high availability

Backup & Recovery

Protect your data

Monitoring

Set up observability

Get Started

Core Features

Architecture

Deployment

Configuration

Integrations

Cloud Providers

Observability

Development

Guides

Reference

Help

​Kubernetes Deployment

​Prerequisites

​Architecture Overview

​Quick Start

​Configuration

​Replica Counts

​External Services

​Resource Limits

​Persistent Storage

​Ingress Configuration

​Pod Isolation

​Helm Commands

​Install

​Upgrade

​Rollback

​Uninstall

​Troubleshooting

​Pods Not Starting

​Database Connection Errors

​Vault Issues

​Ingress Not Working

​View Logs

​kubectl Agent

​Next Steps

Production Best Practices

Scaling Guide

Backup & Recovery

Monitoring

Build docs developers (and LLMs) love

Kubernetes Deployment

Prerequisites

Architecture Overview

Quick Start

Configuration

Replica Counts

External Services

Resource Limits

Persistent Storage

Ingress Configuration

Pod Isolation

Helm Commands

Install

Upgrade

Rollback

Uninstall

Troubleshooting

Pods Not Starting

Database Connection Errors

Vault Issues

Ingress Not Working

View Logs

kubectl Agent

Next Steps