Overview
This guide covers deploying GAIA to production with proper security, monitoring, and reliability measures.
Production deployments require careful planning and security considerations. Review all sections before deploying.
Prerequisites
Before deploying to production:
- Domain name with DNS configured
- SSL certificate (Let’s Encrypt or commercial)
- Server with minimum 8GB RAM, 4 CPU cores
- Docker and Docker Compose installed
- All required API keys obtained
- Backup strategy planned
Production Architecture
Recommended Setup
┌─────────────────────────────────────────────┐
│ Load Balancer / CDN │
│ (Cloudflare, AWS ALB) │
└─────────────────┬───────────────────────────┘
│ HTTPS
┌─────────────────▼───────────────────────────┐
│ Reverse Proxy (Nginx) │
│ SSL Termination & Routing │
└─────┬──────────────────────┬────────────────┘
│ │
│ HTTP │ HTTP
┌─────▼──────────┐ ┌──────▼──────────────┐
│ GAIA Backend │ │ GAIA Frontend │
│ (FastAPI) │ │ (Next.js) │
└─────┬──────────┘ └─────────────────────┘
│
│ Internal Network
┌─────▼──────────────────────────────────────┐
│ Database Layer (Private Network) │
│ PostgreSQL │ MongoDB │ Redis │ ChromaDB │
└────────────────────────────────────────────┘
Security Layers
- Edge: CDN with DDoS protection
- Entry: Reverse proxy with SSL termination
- Application: Isolated containers with limited privileges
- Data: Private network for databases
- Backup: Encrypted off-site backups
SSL Configuration
Using Let’s Encrypt with Nginx
Install Certbot
# Ubuntu/Debian
sudo apt update
sudo apt install certbot python3-certbot-nginx
Obtain SSL certificate
sudo certbot certonly --nginx \
-d yourdomain.com \
-d api.yourdomain.com \
--email [email protected] \
--agree-tos
Configure Nginx
Create /etc/nginx/sites-available/gaia:# API Backend
server {
listen 443 ssl http2;
server_name api.yourdomain.com;
ssl_certificate /etc/letsencrypt/live/yourdomain.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/yourdomain.com/privkey.pem;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!MD5;
client_max_body_size 100M;
location / {
proxy_pass http://localhost:8000;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_cache_bypass $http_upgrade;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Timeouts for streaming
proxy_read_timeout 300s;
proxy_connect_timeout 75s;
}
}
# Redirect HTTP to HTTPS
server {
listen 80;
server_name api.yourdomain.com;
return 301 https://$server_name$request_uri;
}
# Frontend
server {
listen 443 ssl http2;
server_name yourdomain.com;
ssl_certificate /etc/letsencrypt/live/yourdomain.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/yourdomain.com/privkey.pem;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!MD5;
location / {
proxy_pass http://localhost:3000;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_cache_bypass $http_upgrade;
}
}
server {
listen 80;
server_name yourdomain.com;
return 301 https://$server_name$request_uri;
}
Enable configuration
sudo ln -s /etc/nginx/sites-available/gaia /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx
Setup auto-renewal
# Test renewal
sudo certbot renew --dry-run
# Certbot auto-renewal is configured via systemd timer
sudo systemctl status certbot.timer
Production Docker Compose
Environment Configuration
Create production .env file:
cd apps/api
cp .env.example .env
Essential production settings:
# Environment
ENV=production
HOST=https://api.yourdomain.com
FRONTEND_URL=https://yourdomain.com
# Databases (use strong passwords!)
POSTGRES_URL=postgresql://gaia:STRONG_PASSWORD@postgres:5432/langgraph
MONGO_DB=mongodb://gaia:STRONG_PASSWORD@mongo:27017/gaia
REDIS_URL=redis://:STRONG_PASSWORD@redis:6379
CHROMADB_HOST=chromadb
CHROMADB_PORT=8000
RABBITMQ_URL=amqp://gaia:STRONG_PASSWORD@rabbitmq:5672/
# Auth (required)
WORKOS_API_KEY=your-production-workos-key
WORKOS_CLIENT_ID=your-production-client-id
WORKOS_COOKIE_PASSWORD=generate-secure-32-char-password
# LLM (required)
OPENAI_API_KEY=sk-production-key
# Monitoring
SENTRY_DSN=https://your-sentry-dsn
POSTHOG_API_KEY=phc_your-key
LANGSMITH_TRACING=true
LANGSMITH_API_KEY=your-langsmith-key
# ... additional services
Critical Security Requirements:
- Use unique, strong passwords (minimum 32 characters)
- Enable all monitoring services
- Never use default passwords
- Keep API keys secure and rotate regularly
Deploy Production Stack
Navigate to Docker directory
Pull latest images
docker compose -f docker-compose.prod.yml pull
Start services
# Start all services
docker compose -f docker-compose.prod.yml up -d
# Or with specific profiles
docker compose -f docker-compose.prod.yml \
--profile backend-only \
--profile voice-agent \
up -d
Verify deployment
# Check container health
docker compose -f docker-compose.prod.yml ps
# Check API health
curl https://api.yourdomain.com/health
# View logs
docker compose -f docker-compose.prod.yml logs -f gaia-backend
Monitoring and Observability
Sentry (Error Tracking)
Track errors and exceptions:
- Sign up at sentry.io
- Create a new project for GAIA
- Add DSN to environment:
PostHog (Analytics)
Monitor user behavior and performance:
- Sign up at posthog.com
- Get your API key
- Add to environment:
POSTHOG_API_KEY=phc_your_key
LangSmith (LLM Tracing)
Debug and monitor LLM calls:
- Sign up at smith.langchain.com
- Create API key
- Enable in environment:
LANGSMITH_TRACING=true
LANGSMITH_API_KEY=your-key
Docker Container Monitoring
Monitor container resources:
# Real-time stats
docker stats
# Prometheus metrics (if configured)
curl http://localhost:9090/metrics
Health Check Endpoints
Monitor these endpoints:
# API health
curl https://api.yourdomain.com/health
# API metrics
curl https://api.yourdomain.com/metrics
# Database health
docker exec postgres pg_isready
docker exec mongo mongosh --eval "db.adminCommand('ping')"
docker exec redis redis-cli ping
Backup Strategy
Automated Backup Script
Create /opt/gaia/backup.sh:
#!/bin/bash
set -e
BACKUP_DIR="/opt/gaia/backups"
DATE=$(date +%Y%m%d_%H%M%S)
RETENTION_DAYS=30
echo "Starting GAIA backup at $DATE"
# Create backup directory
mkdir -p $BACKUP_DIR/$DATE
# PostgreSQL backup
echo "Backing up PostgreSQL..."
docker exec postgres pg_dump -U postgres langgraph | \
gzip > $BACKUP_DIR/$DATE/postgres.sql.gz
# MongoDB backup
echo "Backing up MongoDB..."
docker exec mongo mongodump --db gaia --archive | \
gzip > $BACKUP_DIR/$DATE/mongo.archive.gz
# Redis backup
echo "Backing up Redis..."
docker exec redis redis-cli SAVE
docker cp redis:/data/dump.rdb $BACKUP_DIR/$DATE/redis-dump.rdb
# ChromaDB backup
echo "Backing up ChromaDB..."
docker run --rm \
-v gaia_chroma_data:/data \
-v $BACKUP_DIR/$DATE:/backup \
alpine tar czf /backup/chroma.tar.gz -C /data .
# Environment backup
echo "Backing up configuration..."
cp /opt/gaia/apps/api/.env $BACKUP_DIR/$DATE/env.backup
# Clean old backups
echo "Cleaning old backups..."
find $BACKUP_DIR -type d -mtime +$RETENTION_DAYS -exec rm -rf {} +
# Upload to S3 (optional)
if [ -n "$S3_BUCKET" ]; then
echo "Uploading to S3..."
aws s3 sync $BACKUP_DIR/$DATE s3://$S3_BUCKET/gaia-backups/$DATE/
fi
echo "Backup completed: $BACKUP_DIR/$DATE"
Make executable and schedule:
chmod +x /opt/gaia/backup.sh
# Add to crontab (daily at 2 AM)
crontab -e
0 2 * * * /opt/gaia/backup.sh >> /var/log/gaia-backup.log 2>&1
Restore from Backup
To restore from a backup:
BACKUP_DATE="20260219_020000" # Adjust to your backup date
BACKUP_DIR="/opt/gaia/backups/$BACKUP_DATE"
# Stop services
cd /opt/gaia/infra/docker
docker compose -f docker-compose.prod.yml down
# Restore PostgreSQL
gunzip < $BACKUP_DIR/postgres.sql.gz | \
docker exec -i postgres psql -U postgres langgraph
# Restore MongoDB
gunzip < $BACKUP_DIR/mongo.archive.gz | \
docker exec -i mongo mongorestore --archive --db gaia
# Restore Redis
docker cp $BACKUP_DIR/redis-dump.rdb redis:/data/dump.rdb
# Restore ChromaDB
docker run --rm \
-v gaia_chroma_data:/data \
-v $BACKUP_DIR:/backup \
alpine sh -c "cd /data && tar xzf /backup/chroma.tar.gz"
# Restart services
docker compose -f docker-compose.prod.yml up -d
Always test your backup restoration process in a staging environment before relying on it for production recovery.
Scaling and High Availability
Horizontal Scaling
Scale specific services:
# In docker-compose.prod.yml
services:
gaia-backend:
deploy:
replicas: 3 # Run 3 instances
resources:
limits:
cpus: '2'
memory: 4G
Database Replication
For high availability:
PostgreSQL: Set up streaming replication
MongoDB: Configure replica set
Redis: Enable Redis Sentinel or Redis Cluster
Load Balancing
Use Nginx or HAProxy to distribute load:
upstream gaia_backend {
least_conn;
server backend1:8000;
server backend2:8000;
server backend3:8000;
}
server {
location / {
proxy_pass http://gaia_backend;
}
}
Security Hardening
Firewall Configuration
# Ubuntu UFW
sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow 22/tcp # SSH
sudo ufw allow 80/tcp # HTTP
sudo ufw allow 443/tcp # HTTPS
sudo ufw enable
Docker Security
- Run as non-root user (already configured in Dockerfile)
- Limit container capabilities:
security_opt:
- no-new-privileges:true
cap_drop:
- ALL
- Use read-only root filesystem where possible:
read_only: true
tmpfs:
- /tmp
Network Isolation
# In docker-compose.prod.yml
networks:
frontend:
driver: bridge
backend:
driver: bridge
internal: true # No external access
services:
gaia-backend:
networks:
- frontend
- backend
postgres:
networks:
- backend # Only accessible internally
Environment Variable Security
Option 1: Use Docker secrets
secrets:
openai_key:
file: ./secrets/openai_key.txt
services:
gaia-backend:
secrets:
- openai_key
Option 2: Use external secrets manager
# Integrate with AWS Secrets Manager, HashiCorp Vault, etc.
OPENAI_API_KEY=$(aws secretsmanager get-secret-value \
--secret-id gaia/openai-key \
--query SecretString \
--output text)
Maintenance
Updates and Upgrades
# Pull latest images
cd /opt/gaia/infra/docker
docker compose -f docker-compose.prod.yml pull
# Backup before upgrade
/opt/gaia/backup.sh
# Rolling update (zero downtime)
docker compose -f docker-compose.prod.yml up -d --no-deps --build gaia-backend
# Or full restart
docker compose -f docker-compose.prod.yml down
docker compose -f docker-compose.prod.yml up -d
Log Rotation
Configure Docker log rotation:
// /etc/docker/daemon.json
{
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3"
}
}
Restart Docker:
sudo systemctl restart docker
Database Maintenance
# PostgreSQL vacuum
docker exec postgres psql -U postgres -d langgraph -c "VACUUM ANALYZE;"
# MongoDB compact
docker exec mongo mongosh --eval "db.runCommand({compact: 'conversations'})"
# Redis memory cleanup
docker exec redis redis-cli MEMORY PURGE
Disaster Recovery
Recovery Plan
- Maintain off-site backups: Store backups in different geographic locations
- Document recovery procedures: Keep runbooks updated
- Test recovery regularly: Quarterly disaster recovery drills
- Monitor backup health: Automated backup verification
- RTO/RPO targets: Define acceptable downtime and data loss
Maintain list of:
- Team members with access
- Cloud provider support
- Database administrators
- Security team
Troubleshooting Production Issues
High Memory Usage
# Check container memory
docker stats
# Restart high-memory containers
docker compose -f docker-compose.prod.yml restart gaia-backend
# Clear Redis cache
docker exec redis redis-cli FLUSHDB
Slow Response Times
- Check database query performance
- Review application logs for bottlenecks
- Monitor external API response times
- Check network latency
- Scale horizontally if needed
Database Connection Pool Exhaustion
# Adjust in postgresql.py
engine = create_async_engine(
url=url,
pool_size=20, # Increase from 5
max_overflow=30, # Increase from 10
)
SSL Certificate Expiration
# Check certificate expiration
sudo certbot certificates
# Force renewal
sudo certbot renew --force-renewal
# Reload Nginx
sudo systemctl reload nginx
CDN Configuration
Use Cloudflare or similar CDN:
- Cache static assets
- DDoS protection
- Geographic distribution
- SSL termination
Database Optimization
-- PostgreSQL: Analyze slow queries
SELECT * FROM pg_stat_statements
ORDER BY total_exec_time DESC LIMIT 10;
-- Add indexes for frequent queries
CREATE INDEX idx_user_created ON users(created_at);
Caching Strategy
# Use Redis for frequently accessed data
# Set appropriate TTLs
await redis.setex(f"user:{user_id}", 3600, user_data)
Compliance and Auditing
GDPR Compliance
- Implement user data export
- Enable user data deletion
- Maintain audit logs
- Encrypt data at rest and in transit
Audit Logging
# Enable audit logging in PostgreSQL
shared_preload_libraries = 'pgaudit'
# MongoDB audit log
auditLog:
destination: file
format: JSON
path: /var/log/mongodb/audit.json
Next Steps
Your GAIA production deployment is now complete! Consider:
- Setting up staging environment for testing
- Implementing CI/CD pipeline
- Creating monitoring dashboards
- Documenting operational procedures
- Training team on maintenance tasks
For ongoing support: