Skip to main content

Overview

This guide covers deploying GAIA to production with proper security, monitoring, and reliability measures.
Production deployments require careful planning and security considerations. Review all sections before deploying.

Prerequisites

Before deploying to production:
  • Domain name with DNS configured
  • SSL certificate (Let’s Encrypt or commercial)
  • Server with minimum 8GB RAM, 4 CPU cores
  • Docker and Docker Compose installed
  • All required API keys obtained
  • Backup strategy planned

Production Architecture

┌─────────────────────────────────────────────┐
│              Load Balancer / CDN            │
│            (Cloudflare, AWS ALB)            │
└─────────────────┬───────────────────────────┘
                  │ HTTPS
┌─────────────────▼───────────────────────────┐
│           Reverse Proxy (Nginx)             │
│         SSL Termination & Routing           │
└─────┬──────────────────────┬────────────────┘
      │                      │
      │ HTTP                 │ HTTP
┌─────▼──────────┐    ┌──────▼──────────────┐
│  GAIA Backend  │    │   GAIA Frontend     │
│   (FastAPI)    │    │    (Next.js)        │
└─────┬──────────┘    └─────────────────────┘

      │ Internal Network
┌─────▼──────────────────────────────────────┐
│     Database Layer (Private Network)       │
│  PostgreSQL │ MongoDB │ Redis │ ChromaDB   │
└────────────────────────────────────────────┘

Security Layers

  1. Edge: CDN with DDoS protection
  2. Entry: Reverse proxy with SSL termination
  3. Application: Isolated containers with limited privileges
  4. Data: Private network for databases
  5. Backup: Encrypted off-site backups

SSL Configuration

Using Let’s Encrypt with Nginx

1

Install Certbot

# Ubuntu/Debian
sudo apt update
sudo apt install certbot python3-certbot-nginx
2

Obtain SSL certificate

sudo certbot certonly --nginx \
  -d yourdomain.com \
  -d api.yourdomain.com \
  --email [email protected] \
  --agree-tos
3

Configure Nginx

Create /etc/nginx/sites-available/gaia:
# API Backend
server {
    listen 443 ssl http2;
    server_name api.yourdomain.com;
    
    ssl_certificate /etc/letsencrypt/live/yourdomain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/yourdomain.com/privkey.pem;
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers HIGH:!aNULL:!MD5;
    
    client_max_body_size 100M;
    
    location / {
        proxy_pass http://localhost:8000;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_cache_bypass $http_upgrade;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        
        # Timeouts for streaming
        proxy_read_timeout 300s;
        proxy_connect_timeout 75s;
    }
}

# Redirect HTTP to HTTPS
server {
    listen 80;
    server_name api.yourdomain.com;
    return 301 https://$server_name$request_uri;
}

# Frontend
server {
    listen 443 ssl http2;
    server_name yourdomain.com;
    
    ssl_certificate /etc/letsencrypt/live/yourdomain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/yourdomain.com/privkey.pem;
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers HIGH:!aNULL:!MD5;
    
    location / {
        proxy_pass http://localhost:3000;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_cache_bypass $http_upgrade;
    }
}

server {
    listen 80;
    server_name yourdomain.com;
    return 301 https://$server_name$request_uri;
}
4

Enable configuration

sudo ln -s /etc/nginx/sites-available/gaia /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx
5

Setup auto-renewal

# Test renewal
sudo certbot renew --dry-run

# Certbot auto-renewal is configured via systemd timer
sudo systemctl status certbot.timer

Production Docker Compose

Environment Configuration

Create production .env file:
cd apps/api
cp .env.example .env
Essential production settings:
# Environment
ENV=production
HOST=https://api.yourdomain.com
FRONTEND_URL=https://yourdomain.com

# Databases (use strong passwords!)
POSTGRES_URL=postgresql://gaia:STRONG_PASSWORD@postgres:5432/langgraph
MONGO_DB=mongodb://gaia:STRONG_PASSWORD@mongo:27017/gaia
REDIS_URL=redis://:STRONG_PASSWORD@redis:6379
CHROMADB_HOST=chromadb
CHROMADB_PORT=8000
RABBITMQ_URL=amqp://gaia:STRONG_PASSWORD@rabbitmq:5672/

# Auth (required)
WORKOS_API_KEY=your-production-workos-key
WORKOS_CLIENT_ID=your-production-client-id
WORKOS_COOKIE_PASSWORD=generate-secure-32-char-password

# LLM (required)
OPENAI_API_KEY=sk-production-key

# Monitoring
SENTRY_DSN=https://your-sentry-dsn
POSTHOG_API_KEY=phc_your-key
LANGSMITH_TRACING=true
LANGSMITH_API_KEY=your-langsmith-key

# ... additional services
Critical Security Requirements:
  • Use unique, strong passwords (minimum 32 characters)
  • Enable all monitoring services
  • Never use default passwords
  • Keep API keys secure and rotate regularly

Deploy Production Stack

1

Navigate to Docker directory

cd infra/docker
2

Pull latest images

docker compose -f docker-compose.prod.yml pull
3

Start services

# Start all services
docker compose -f docker-compose.prod.yml up -d

# Or with specific profiles
docker compose -f docker-compose.prod.yml \
  --profile backend-only \
  --profile voice-agent \
  up -d
4

Verify deployment

# Check container health
docker compose -f docker-compose.prod.yml ps

# Check API health
curl https://api.yourdomain.com/health

# View logs
docker compose -f docker-compose.prod.yml logs -f gaia-backend

Monitoring and Observability

Sentry (Error Tracking)

Track errors and exceptions:
  1. Sign up at sentry.io
  2. Create a new project for GAIA
  3. Add DSN to environment:
    SENTRY_DSN=https://[email protected]/your-project
    

PostHog (Analytics)

Monitor user behavior and performance:
  1. Sign up at posthog.com
  2. Get your API key
  3. Add to environment:
    POSTHOG_API_KEY=phc_your_key
    

LangSmith (LLM Tracing)

Debug and monitor LLM calls:
  1. Sign up at smith.langchain.com
  2. Create API key
  3. Enable in environment:
    LANGSMITH_TRACING=true
    LANGSMITH_API_KEY=your-key
    

Docker Container Monitoring

Monitor container resources:
# Real-time stats
docker stats

# Prometheus metrics (if configured)
curl http://localhost:9090/metrics

Health Check Endpoints

Monitor these endpoints:
# API health
curl https://api.yourdomain.com/health

# API metrics
curl https://api.yourdomain.com/metrics

# Database health
docker exec postgres pg_isready
docker exec mongo mongosh --eval "db.adminCommand('ping')"
docker exec redis redis-cli ping

Backup Strategy

Automated Backup Script

Create /opt/gaia/backup.sh:
#!/bin/bash
set -e

BACKUP_DIR="/opt/gaia/backups"
DATE=$(date +%Y%m%d_%H%M%S)
RETENTION_DAYS=30

echo "Starting GAIA backup at $DATE"

# Create backup directory
mkdir -p $BACKUP_DIR/$DATE

# PostgreSQL backup
echo "Backing up PostgreSQL..."
docker exec postgres pg_dump -U postgres langgraph | \
  gzip > $BACKUP_DIR/$DATE/postgres.sql.gz

# MongoDB backup
echo "Backing up MongoDB..."
docker exec mongo mongodump --db gaia --archive | \
  gzip > $BACKUP_DIR/$DATE/mongo.archive.gz

# Redis backup
echo "Backing up Redis..."
docker exec redis redis-cli SAVE
docker cp redis:/data/dump.rdb $BACKUP_DIR/$DATE/redis-dump.rdb

# ChromaDB backup
echo "Backing up ChromaDB..."
docker run --rm \
  -v gaia_chroma_data:/data \
  -v $BACKUP_DIR/$DATE:/backup \
  alpine tar czf /backup/chroma.tar.gz -C /data .

# Environment backup
echo "Backing up configuration..."
cp /opt/gaia/apps/api/.env $BACKUP_DIR/$DATE/env.backup

# Clean old backups
echo "Cleaning old backups..."
find $BACKUP_DIR -type d -mtime +$RETENTION_DAYS -exec rm -rf {} +

# Upload to S3 (optional)
if [ -n "$S3_BUCKET" ]; then
  echo "Uploading to S3..."
  aws s3 sync $BACKUP_DIR/$DATE s3://$S3_BUCKET/gaia-backups/$DATE/
fi

echo "Backup completed: $BACKUP_DIR/$DATE"
Make executable and schedule:
chmod +x /opt/gaia/backup.sh

# Add to crontab (daily at 2 AM)
crontab -e
0 2 * * * /opt/gaia/backup.sh >> /var/log/gaia-backup.log 2>&1

Restore from Backup

To restore from a backup:
BACKUP_DATE="20260219_020000"  # Adjust to your backup date
BACKUP_DIR="/opt/gaia/backups/$BACKUP_DATE"

# Stop services
cd /opt/gaia/infra/docker
docker compose -f docker-compose.prod.yml down

# Restore PostgreSQL
gunzip < $BACKUP_DIR/postgres.sql.gz | \
  docker exec -i postgres psql -U postgres langgraph

# Restore MongoDB
gunzip < $BACKUP_DIR/mongo.archive.gz | \
  docker exec -i mongo mongorestore --archive --db gaia

# Restore Redis
docker cp $BACKUP_DIR/redis-dump.rdb redis:/data/dump.rdb

# Restore ChromaDB
docker run --rm \
  -v gaia_chroma_data:/data \
  -v $BACKUP_DIR:/backup \
  alpine sh -c "cd /data && tar xzf /backup/chroma.tar.gz"

# Restart services
docker compose -f docker-compose.prod.yml up -d
Always test your backup restoration process in a staging environment before relying on it for production recovery.

Scaling and High Availability

Horizontal Scaling

Scale specific services:
# In docker-compose.prod.yml
services:
  gaia-backend:
    deploy:
      replicas: 3  # Run 3 instances
      resources:
        limits:
          cpus: '2'
          memory: 4G

Database Replication

For high availability: PostgreSQL: Set up streaming replication MongoDB: Configure replica set Redis: Enable Redis Sentinel or Redis Cluster

Load Balancing

Use Nginx or HAProxy to distribute load:
upstream gaia_backend {
    least_conn;
    server backend1:8000;
    server backend2:8000;
    server backend3:8000;
}

server {
    location / {
        proxy_pass http://gaia_backend;
    }
}

Security Hardening

Firewall Configuration

# Ubuntu UFW
sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow 22/tcp    # SSH
sudo ufw allow 80/tcp    # HTTP
sudo ufw allow 443/tcp   # HTTPS
sudo ufw enable

Docker Security

  1. Run as non-root user (already configured in Dockerfile)
  2. Limit container capabilities:
    security_opt:
      - no-new-privileges:true
    cap_drop:
      - ALL
    
  3. Use read-only root filesystem where possible:
    read_only: true
    tmpfs:
      - /tmp
    

Network Isolation

# In docker-compose.prod.yml
networks:
  frontend:
    driver: bridge
  backend:
    driver: bridge
    internal: true  # No external access

services:
  gaia-backend:
    networks:
      - frontend
      - backend
  
  postgres:
    networks:
      - backend  # Only accessible internally

Environment Variable Security

Option 1: Use Docker secrets
secrets:
  openai_key:
    file: ./secrets/openai_key.txt

services:
  gaia-backend:
    secrets:
      - openai_key
Option 2: Use external secrets manager
# Integrate with AWS Secrets Manager, HashiCorp Vault, etc.
OPENAI_API_KEY=$(aws secretsmanager get-secret-value \
  --secret-id gaia/openai-key \
  --query SecretString \
  --output text)

Maintenance

Updates and Upgrades

# Pull latest images
cd /opt/gaia/infra/docker
docker compose -f docker-compose.prod.yml pull

# Backup before upgrade
/opt/gaia/backup.sh

# Rolling update (zero downtime)
docker compose -f docker-compose.prod.yml up -d --no-deps --build gaia-backend

# Or full restart
docker compose -f docker-compose.prod.yml down
docker compose -f docker-compose.prod.yml up -d

Log Rotation

Configure Docker log rotation:
// /etc/docker/daemon.json
{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  }
}
Restart Docker:
sudo systemctl restart docker

Database Maintenance

# PostgreSQL vacuum
docker exec postgres psql -U postgres -d langgraph -c "VACUUM ANALYZE;"

# MongoDB compact
docker exec mongo mongosh --eval "db.runCommand({compact: 'conversations'})"

# Redis memory cleanup
docker exec redis redis-cli MEMORY PURGE

Disaster Recovery

Recovery Plan

  1. Maintain off-site backups: Store backups in different geographic locations
  2. Document recovery procedures: Keep runbooks updated
  3. Test recovery regularly: Quarterly disaster recovery drills
  4. Monitor backup health: Automated backup verification
  5. RTO/RPO targets: Define acceptable downtime and data loss

Emergency Contacts

Maintain list of:
  • Team members with access
  • Cloud provider support
  • Database administrators
  • Security team

Troubleshooting Production Issues

High Memory Usage

# Check container memory
docker stats

# Restart high-memory containers
docker compose -f docker-compose.prod.yml restart gaia-backend

# Clear Redis cache
docker exec redis redis-cli FLUSHDB

Slow Response Times

  1. Check database query performance
  2. Review application logs for bottlenecks
  3. Monitor external API response times
  4. Check network latency
  5. Scale horizontally if needed

Database Connection Pool Exhaustion

# Adjust in postgresql.py
engine = create_async_engine(
    url=url,
    pool_size=20,        # Increase from 5
    max_overflow=30,     # Increase from 10
)

SSL Certificate Expiration

# Check certificate expiration
sudo certbot certificates

# Force renewal
sudo certbot renew --force-renewal

# Reload Nginx
sudo systemctl reload nginx

Performance Optimization

CDN Configuration

Use Cloudflare or similar CDN:
  • Cache static assets
  • DDoS protection
  • Geographic distribution
  • SSL termination

Database Optimization

-- PostgreSQL: Analyze slow queries
SELECT * FROM pg_stat_statements 
ORDER BY total_exec_time DESC LIMIT 10;

-- Add indexes for frequent queries
CREATE INDEX idx_user_created ON users(created_at);

Caching Strategy

# Use Redis for frequently accessed data
# Set appropriate TTLs
await redis.setex(f"user:{user_id}", 3600, user_data)

Compliance and Auditing

GDPR Compliance

  • Implement user data export
  • Enable user data deletion
  • Maintain audit logs
  • Encrypt data at rest and in transit

Audit Logging

# Enable audit logging in PostgreSQL
shared_preload_libraries = 'pgaudit'

# MongoDB audit log
auditLog:
  destination: file
  format: JSON
  path: /var/log/mongodb/audit.json

Next Steps

Your GAIA production deployment is now complete! Consider:
  • Setting up staging environment for testing
  • Implementing CI/CD pipeline
  • Creating monitoring dashboards
  • Documenting operational procedures
  • Training team on maintenance tasks
For ongoing support:

Build docs developers (and LLMs) love