Production Deployment

Production Checklist

Security Hardening

Change all default passwords and secrets

Generate new API keys and tokens

Configure SSL/TLS certificates

Set up firewall rules

Enable authentication for all services

High Availability

Deploy worker nodes on separate hardware

Configure backup and restore procedures

Set up database replication (if needed)

Implement load balancing

Configure health checks and auto-restart

Monitoring & Observability

Deploy full observability stack

Configure alerting rules

Set up log aggregation

Enable metrics collection

Configure trace sampling

Resource Planning

Size compute resources appropriately

Configure resource limits

Plan storage capacity

Set up backup storage

Monitor resource usage

Security Hardening

Change Default Credentials

Critical: All default passwords, salts, and secrets MUST be changed before production deployment.

Main Services
Langfuse

# PentAGI Security
COOKIE_SIGNING_SALT=$(openssl rand -hex 32)
PUBLIC_URL=https://pentagi.example.com
CORS_ORIGINS=https://pentagi.example.com

# PostgreSQL
PENTAGI_POSTGRES_USER=pentagi_prod
PENTAGI_POSTGRES_PASSWORD=$(openssl rand -base64 32)
PENTAGI_POSTGRES_DB=pentagidb

# Neo4j (for Graphiti)
NEO4J_USER=neo4j
NEO4J_PASSWORD=$(openssl rand -base64 32)

# Scraper
LOCAL_SCRAPER_USERNAME=pentagi_scraper
LOCAL_SCRAPER_PASSWORD=$(openssl rand -base64 32)

# Langfuse Security
LANGFUSE_SALT=$(openssl rand -hex 32)
LANGFUSE_ENCRYPTION_KEY=$(openssl rand -hex 32)
LANGFUSE_NEXTAUTH_SECRET=$(openssl rand -hex 32)

# Langfuse Database
LANGFUSE_POSTGRES_USER=langfuse_prod
LANGFUSE_POSTGRES_PASSWORD=$(openssl rand -base64 32)
LANGFUSE_CLICKHOUSE_USER=clickhouse
LANGFUSE_CLICKHOUSE_PASSWORD=$(openssl rand -base64 32)

# Langfuse Storage
LANGFUSE_S3_ACCESS_KEY_ID=$(openssl rand -hex 16)
LANGFUSE_S3_SECRET_ACCESS_KEY=$(openssl rand -hex 32)
LANGFUSE_REDIS_AUTH=$(openssl rand -base64 32)

# Langfuse Admin
LANGFUSE_INIT_USER_EMAIL=[email protected]
LANGFUSE_INIT_USER_PASSWORD=$(openssl rand -base64 24)

# Langfuse API Keys
LANGFUSE_INIT_PROJECT_PUBLIC_KEY=pk-lf-$(uuidgen)
LANGFUSE_INIT_PROJECT_SECRET_KEY=sk-lf-$(uuidgen)

SSL/TLS Configuration

Obtain SSL certificates

Use Let’s Encrypt for free certificates:

# Install certbot
sudo apt-get install certbot

# Obtain certificate
sudo certbot certonly --standalone \
  -d pentagi.example.com \
  --email [email protected] \
  --agree-tos

# Certificates will be in:
# /etc/letsencrypt/live/pentagi.example.com/

Configure PentAGI to use certificates

Edit .env:

PUBLIC_URL=https://pentagi.example.com
SERVER_SSL_CRT=/etc/letsencrypt/live/pentagi.example.com/fullchain.pem
SERVER_SSL_KEY=/etc/letsencrypt/live/pentagi.example.com/privkey.pem
SERVER_USE_SSL=true

Update docker-compose.yml to mount certificates:

services:
  pentagi:
    volumes:
      - /etc/letsencrypt/live/pentagi.example.com:/etc/ssl/pentagi:ro

Set up automatic renewal

# Test renewal
sudo certbot renew --dry-run

# Add cron job for auto-renewal
sudo crontab -e
# Add line:
0 0 * * 0 certbot renew --quiet && docker compose restart pentagi

Firewall Configuration

Main Node
Worker Node

# Enable UFW
sudo ufw enable

# Allow SSH (adjust port if needed)
sudo ufw allow 22/tcp

# Allow HTTPS for PentAGI
sudo ufw allow 443/tcp
sudo ufw allow 8443/tcp

# Allow from monitoring network (adjust subnet)
sudo ufw allow from 10.0.0.0/24 to any port 3000 comment 'Grafana'
sudo ufw allow from 10.0.0.0/24 to any port 4000 comment 'Langfuse'

# Deny all other incoming by default
sudo ufw default deny incoming
sudo ufw default allow outgoing

# Enable firewall
sudo ufw reload

# Enable UFW
sudo ufw enable

# Allow SSH
sudo ufw allow 22/tcp

# Allow Docker API from main node only
sudo ufw allow from <MAIN_NODE_IP> to any port 2376 proto tcp comment 'Docker API'
sudo ufw allow from <MAIN_NODE_IP> to any port 3376 proto tcp comment 'dind API'

# Allow metrics from monitoring network
sudo ufw allow from 10.0.0.0/24 to any port 9323 comment 'Docker metrics'
sudo ufw allow from 10.0.0.0/24 to any port 9324 comment 'dind metrics'

# Allow OOB attack ports (adjust based on target networks)
sudo ufw allow 28000:30000/tcp comment 'OOB attacks'
sudo ufw allow 28000:30000/udp comment 'OOB attacks'

sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw reload

Authentication and Access Control

Enable OAuth2 authentication

Configure OAuth2 providers in .env:

# Google OAuth
OAUTH_GOOGLE_CLIENT_ID=your_client_id
OAUTH_GOOGLE_CLIENT_SECRET=your_client_secret

# GitHub OAuth
OAUTH_GITHUB_CLIENT_ID=your_client_id
OAUTH_GITHUB_CLIENT_SECRET=your_client_secret

Disable default admin account

After creating OAuth users, disable the default admin account via UI.

Configure Langfuse SSO

For enterprise deployments:

# Custom OAuth2 for Langfuse
LANGFUSE_AUTH_CUSTOM_CLIENT_ID=your_client_id
LANGFUSE_AUTH_CUSTOM_CLIENT_SECRET=your_client_secret
LANGFUSE_AUTH_CUSTOM_ISSUER=https://auth.example.com
LANGFUSE_AUTH_DISABLE_SIGNUP=true

High Availability

Distributed Architecture

For production deployments, use the distributed worker node architecture to isolate penetration testing workloads.

Recommended topology:

┌──────────────────┐
│   Load Balancer  │
│   (nginx/HAProxy)│
└────────┬─────────┘
         │
    ┌────┴────┐
    │         │
┌───▼────┐ ┌─▼───────┐
│ Main   │ │ Main    │
│ Node 1 │ │ Node 2  │
└───┬────┘ └─┬───────┘
    │        │
    └───┬────┘
        │
  ┌─────┴──────┐
  │            │
┌─▼──────┐  ┌─▼──────┐
│Worker  │  │Worker  │
│Node 1  │  │Node 2  │
└────────┘  └────────┘

Database Backup and Restore

PostgreSQL Backup
Neo4j Backup

# Create backup directory
mkdir -p /opt/pentagi/backups

# Automated backup script
cat > /opt/pentagi/backup-postgres.sh << 'EOF'
#!/bin/bash
BACKUP_DIR="/opt/pentagi/backups"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)

docker compose exec -T pgvector pg_dump -U postgres pentagidb | \
  gzip > "${BACKUP_DIR}/pentagi_${TIMESTAMP}.sql.gz"

# Keep only last 7 days
find "${BACKUP_DIR}" -name "pentagi_*.sql.gz" -mtime +7 -delete
EOF

chmod +x /opt/pentagi/backup-postgres.sh

# Add to cron (daily at 2 AM)
echo "0 2 * * * /opt/pentagi/backup-postgres.sh" | crontab -

Restore:

# Stop PentAGI
docker compose stop pentagi

# Restore backup
gunzip -c pentagi_20260220_020000.sql.gz | \
  docker compose exec -T pgvector psql -U postgres -d pentagidb

# Restart PentAGI
docker compose start pentagi

# Backup Neo4j database
docker compose exec neo4j neo4j-admin database dump neo4j \
  --to-path=/backups/neo4j_$(date +%Y%m%d_%H%M%S).dump

# Copy from container
docker cp neo4j:/backups /opt/pentagi/backups/neo4j/

Restore:

# Stop Graphiti services
docker compose -f docker-compose-graphiti.yml stop

# Restore database
docker compose exec neo4j neo4j-admin database load neo4j \
  --from-path=/backups/neo4j_20260220_020000.dump

# Restart services
docker compose -f docker-compose-graphiti.yml start

Health Checks and Auto-Restart

Configure Docker health checks

All PentAGI services already include health checks in docker-compose.yml:

services:
  pentagi:
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "https://localhost:8443/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 60s

Monitor with external watchdog

Install and configure a watchdog service:

# Install monit
sudo apt-get install monit

# Configure PentAGI monitoring
sudo cat > /etc/monit/conf.d/pentagi << 'EOF'
check host pentagi with address localhost
  if failed
    port 8443
    protocol https
    request /health
    with timeout 10 seconds
    for 3 cycles
  then exec "/usr/bin/docker compose -f /opt/pentagi/docker-compose.yml restart pentagi"
EOF

sudo systemctl restart monit

Resource Planning

Recommended Resources

Small Deployment
Medium Deployment
Large Deployment

Use case: 1-2 concurrent users, basic penetration testing

Component	CPU	Memory	Storage
Main Node	4 vCPU	8 GB	100 GB SSD
Worker Node	4 vCPU	8 GB	50 GB SSD
Total	8 vCPU	16 GB	150 GB

Component	CPU	Memory	Storage
Main Node	8 vCPU	16 GB	250 GB SSD
Worker Node 1	8 vCPU	16 GB	100 GB SSD
Worker Node 2	8 vCPU	16 GB	100 GB SSD
Total	24 vCPU	48 GB	450 GB

Use case: 20+ concurrent users, heavy workload

Component	CPU	Memory	Storage
Main Node 1	16 vCPU	32 GB	500 GB SSD
Main Node 2	16 vCPU	32 GB	500 GB SSD
Worker Node 1	16 vCPU	32 GB	200 GB SSD
Worker Node 2	16 vCPU	32 GB	200 GB SSD
Worker Node 3	16 vCPU	32 GB	200 GB SSD
Total	80 vCPU	160 GB	1.6 TB

Configure Resource Limits

Edit docker-compose.yml to add resource constraints:

services:
  pentagi:
    deploy:
      resources:
        limits:
          cpus: '4'
          memory: 8G
        reservations:
          cpus: '2'
          memory: 4G

  pgvector:
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 4G
        reservations:
          cpus: '1'
          memory: 2G

  scraper:
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 2G
        reservations:
          cpus: '1'
          memory: 1G

Storage Management

Monitor disk usage

# Check Docker disk usage
docker system df -v

# Check volume sizes
docker volume ls --format '{{.Name}}' | \
  xargs -I {} sh -c 'echo "Volume: {}"; docker volume inspect {} --format "{{.Mountpoint}}" | xargs du -sh'

Configure log rotation

Already configured in docker-compose.yml:

services:
  pentagi:
    logging:
      options:
        max-size: 50m
        max-file: "7"

Clean up old data

# Remove unused images
docker image prune -a --filter "until=168h"

# Remove unused volumes (careful!)
docker volume prune --filter "label!=keep"

# Remove old worker containers
docker ps -a | grep pentagi-terminal | \
  awk '{if (NR>10) print $1}' | xargs docker rm

Performance Optimization

Database Tuning

Configure PostgreSQL performance

Create postgresql.conf override:

# /opt/pentagi/postgres/postgresql.conf
shared_buffers = 2GB
effective_cache_size = 6GB
maintenance_work_mem = 512MB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 10MB
min_wal_size = 1GB
max_wal_size = 4GB
max_worker_processes = 4
max_parallel_workers_per_gather = 2
max_parallel_workers = 4
max_parallel_maintenance_workers = 2

Mount in docker-compose.yml:

services:
  pgvector:
    volumes:
      - ./postgres/postgresql.conf:/etc/postgresql/postgresql.conf:ro
    command: postgres -c config_file=/etc/postgresql/postgresql.conf

Optimize vector search

-- Connect to database
docker compose exec pgvector psql -U postgres -d pentagidb

-- Create indexes for vector search
CREATE INDEX ON memories USING ivfflat (embedding vector_cosine_ops)
  WITH (lists = 100);

-- Analyze tables
ANALYZE memories;

LLM Provider Optimization

Rate Limiting
Caching
Model Selection

Configure rate limits to prevent API quota exhaustion:

# Use Redis for rate limiting (future feature)
# Currently managed by LLM provider quotas

Enable response caching to reduce API costs:

# Future feature: LLM response caching
# Will use Redis for cache storage

Choose cost-effective models for different agent types:

# example.custom.provider.yml
agents:
  researcher:
    model: gpt-4o-mini  # Cheaper model for research
  developer:
    model: gpt-4o       # Better model for code
  executor:
    model: gpt-4o-mini  # Cheaper for execution

Network Optimization

Enable connection pooling

Already configured in PentAGI for database connections.

Use HTTP/2

Modern reverse proxies like nginx support HTTP/2:

server {
    listen 443 ssl http2;
    server_name pentagi.example.com;
    
    ssl_certificate /etc/ssl/certs/pentagi.crt;
    ssl_certificate_key /etc/ssl/private/pentagi.key;
    
    location / {
        proxy_pass https://localhost:8443;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
}

Monitoring and Alerting

Grafana Dashboard Configuration

Import PentAGI dashboards

Pre-configured dashboards are available in observability/grafana/dashboards/:

PentAGI System Overview
Docker Metrics
PostgreSQL Performance
LLM Usage and Costs

Configure alert rules

Create alert rules in Grafana:

# High CPU usage
- alert: HighCPUUsage
  expr: rate(container_cpu_usage_seconds_total{name="pentagi"}[5m]) > 0.8
  for: 5m
  annotations:
    summary: "PentAGI high CPU usage"

# High memory usage
- alert: HighMemoryUsage
  expr: container_memory_usage_bytes{name="pentagi"} / container_spec_memory_limit_bytes{name="pentagi"} > 0.9
  for: 5m
  annotations:
    summary: "PentAGI high memory usage"

# Database connection issues
- alert: DatabaseConnectionFailure
  expr: pg_up == 0
  for: 1m
  annotations:
    summary: "PostgreSQL is down"

Configure notification channels

Set up notifications in Grafana:

Email
Slack
PagerDuty
Webhook

Log Analysis

Access logs via Loki

Query logs in Grafana using LogQL:

# All PentAGI errors
{container_name="pentagi"} |= "ERROR"

# Database query performance
{container_name="pgvector"} |= "slow query"

# LLM API errors
{container_name="pentagi"} |= "OpenAI" |= "error"

Export logs for analysis

# Export logs to file
docker compose logs --since 24h pentagi > pentagi-$(date +%Y%m%d).log

# Search for errors
grep -i error pentagi-$(date +%Y%m%d).log

Disaster Recovery

Backup Strategy

Daily Backups

Automated daily backups of databases and configuration

Weekly Archives

Full system snapshots retained for 4 weeks

Offsite Storage

Critical backups replicated to remote storage

Point-in-Time Recovery

PostgreSQL WAL archiving for recovery to any point

Recovery Procedures

Complete System Failure

Provision new infrastructure
Install Docker and Docker Compose
Restore .env configuration from backup
Restore PostgreSQL database
Restore Neo4j database (if using Graphiti)
Start all services
Verify functionality
Update DNS if needed

Database Corruption

Stop affected services
Restore database from most recent backup
Replay transaction logs if available
Verify data integrity
Restart services
Monitor for issues

Certificate Expiration

Generate new certificates
Update .env with new certificate paths
Restart PentAGI service
Verify HTTPS access
Update monitoring to alert 30 days before expiration

Compliance and Auditing

Audit Logging

Enable audit logs

PentAGI automatically logs all actions. Configure retention:

# Set log retention in days
PENTAGI_LOG_RETENTION_DAYS=90

Export audit logs

# Export logs for compliance
docker compose exec pgvector psql -U postgres -d pentagidb -c \
  "COPY (SELECT * FROM audit_logs WHERE created_at > NOW() - INTERVAL '30 days') \
  TO STDOUT CSV HEADER" > audit_$(date +%Y%m).csv

Data Retention Policies

Configure in .env:

# Retain task results for 90 days
TASK_RETENTION_DAYS=90

# Retain worker logs for 30 days
WORKER_LOG_RETENTION_DAYS=30

# Retain LLM traces for 60 days
LLM_TRACE_RETENTION_DAYS=60

Next Steps

Troubleshooting

Common production issues and solutions

Scaling Guide

Scale PentAGI for larger workloads

Security Hardening

Advanced security configurations

Monitoring

Deep dive into monitoring and metrics

Get Started

Core Concepts

Configuration

Deployment

Features

Development

​Production Checklist

​Security Hardening

​Change Default Credentials

​SSL/TLS Configuration

​Firewall Configuration

​Authentication and Access Control

​High Availability

​Distributed Architecture

​Database Backup and Restore

​Health Checks and Auto-Restart

​Resource Planning

​Recommended Resources

​Configure Resource Limits

​Storage Management

​Performance Optimization

​Database Tuning

​LLM Provider Optimization

​Network Optimization

​Monitoring and Alerting

​Grafana Dashboard Configuration

​Log Analysis

​Disaster Recovery

​Backup Strategy

Daily Backups

Weekly Archives

Offsite Storage

Point-in-Time Recovery

​Recovery Procedures

​Compliance and Auditing

​Audit Logging

​Data Retention Policies

​Next Steps

Troubleshooting

Scaling Guide

Security Hardening

Monitoring

Build docs developers (and LLMs) love

Production Checklist

Security Hardening

Change Default Credentials

SSL/TLS Configuration

Firewall Configuration

Authentication and Access Control

High Availability

Distributed Architecture

Database Backup and Restore

Health Checks and Auto-Restart

Resource Planning

Recommended Resources

Configure Resource Limits

Storage Management

Performance Optimization

Database Tuning

LLM Provider Optimization

Network Optimization

Monitoring and Alerting

Grafana Dashboard Configuration

Log Analysis

Disaster Recovery

Backup Strategy

Recovery Procedures

Compliance and Auditing

Audit Logging

Data Retention Policies

Next Steps