Skip to main content

Production Checklist

1

Security Hardening

Change all default passwords and secrets
Generate new API keys and tokens
Configure SSL/TLS certificates
Set up firewall rules
Enable authentication for all services
2

High Availability

Deploy worker nodes on separate hardware
Configure backup and restore procedures
Set up database replication (if needed)
Implement load balancing
Configure health checks and auto-restart
3

Monitoring & Observability

Deploy full observability stack
Configure alerting rules
Set up log aggregation
Enable metrics collection
Configure trace sampling
4

Resource Planning

Size compute resources appropriately
Configure resource limits
Plan storage capacity
Set up backup storage
Monitor resource usage

Security Hardening

Change Default Credentials

Critical: All default passwords, salts, and secrets MUST be changed before production deployment.
# PentAGI Security
COOKIE_SIGNING_SALT=$(openssl rand -hex 32)
PUBLIC_URL=https://pentagi.example.com
CORS_ORIGINS=https://pentagi.example.com

# PostgreSQL
PENTAGI_POSTGRES_USER=pentagi_prod
PENTAGI_POSTGRES_PASSWORD=$(openssl rand -base64 32)
PENTAGI_POSTGRES_DB=pentagidb

# Neo4j (for Graphiti)
NEO4J_USER=neo4j
NEO4J_PASSWORD=$(openssl rand -base64 32)

# Scraper
LOCAL_SCRAPER_USERNAME=pentagi_scraper
LOCAL_SCRAPER_PASSWORD=$(openssl rand -base64 32)

SSL/TLS Configuration

1

Obtain SSL certificates

Use Let’s Encrypt for free certificates:
# Install certbot
sudo apt-get install certbot

# Obtain certificate
sudo certbot certonly --standalone \
  -d pentagi.example.com \
  --email [email protected] \
  --agree-tos

# Certificates will be in:
# /etc/letsencrypt/live/pentagi.example.com/
2

Configure PentAGI to use certificates

Edit .env:
PUBLIC_URL=https://pentagi.example.com
SERVER_SSL_CRT=/etc/letsencrypt/live/pentagi.example.com/fullchain.pem
SERVER_SSL_KEY=/etc/letsencrypt/live/pentagi.example.com/privkey.pem
SERVER_USE_SSL=true
Update docker-compose.yml to mount certificates:
services:
  pentagi:
    volumes:
      - /etc/letsencrypt/live/pentagi.example.com:/etc/ssl/pentagi:ro
3

Set up automatic renewal

# Test renewal
sudo certbot renew --dry-run

# Add cron job for auto-renewal
sudo crontab -e
# Add line:
0 0 * * 0 certbot renew --quiet && docker compose restart pentagi

Firewall Configuration

# Enable UFW
sudo ufw enable

# Allow SSH (adjust port if needed)
sudo ufw allow 22/tcp

# Allow HTTPS for PentAGI
sudo ufw allow 443/tcp
sudo ufw allow 8443/tcp

# Allow from monitoring network (adjust subnet)
sudo ufw allow from 10.0.0.0/24 to any port 3000 comment 'Grafana'
sudo ufw allow from 10.0.0.0/24 to any port 4000 comment 'Langfuse'

# Deny all other incoming by default
sudo ufw default deny incoming
sudo ufw default allow outgoing

# Enable firewall
sudo ufw reload

Authentication and Access Control

1

Enable OAuth2 authentication

Configure OAuth2 providers in .env:
# Google OAuth
OAUTH_GOOGLE_CLIENT_ID=your_client_id
OAUTH_GOOGLE_CLIENT_SECRET=your_client_secret

# GitHub OAuth
OAUTH_GITHUB_CLIENT_ID=your_client_id
OAUTH_GITHUB_CLIENT_SECRET=your_client_secret
2

Disable default admin account

After creating OAuth users, disable the default admin account via UI.
3

Configure Langfuse SSO

For enterprise deployments:
# Custom OAuth2 for Langfuse
LANGFUSE_AUTH_CUSTOM_CLIENT_ID=your_client_id
LANGFUSE_AUTH_CUSTOM_CLIENT_SECRET=your_client_secret
LANGFUSE_AUTH_CUSTOM_ISSUER=https://auth.example.com
LANGFUSE_AUTH_DISABLE_SIGNUP=true

High Availability

Distributed Architecture

For production deployments, use the distributed worker node architecture to isolate penetration testing workloads.
Recommended topology:
┌──────────────────┐
│   Load Balancer  │
│   (nginx/HAProxy)│
└────────┬─────────┘

    ┌────┴────┐
    │         │
┌───▼────┐ ┌─▼───────┐
│ Main   │ │ Main    │
│ Node 1 │ │ Node 2  │
└───┬────┘ └─┬───────┘
    │        │
    └───┬────┘

  ┌─────┴──────┐
  │            │
┌─▼──────┐  ┌─▼──────┐
│Worker  │  │Worker  │
│Node 1  │  │Node 2  │
└────────┘  └────────┘

Database Backup and Restore

# Create backup directory
mkdir -p /opt/pentagi/backups

# Automated backup script
cat > /opt/pentagi/backup-postgres.sh << 'EOF'
#!/bin/bash
BACKUP_DIR="/opt/pentagi/backups"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)

docker compose exec -T pgvector pg_dump -U postgres pentagidb | \
  gzip > "${BACKUP_DIR}/pentagi_${TIMESTAMP}.sql.gz"

# Keep only last 7 days
find "${BACKUP_DIR}" -name "pentagi_*.sql.gz" -mtime +7 -delete
EOF

chmod +x /opt/pentagi/backup-postgres.sh

# Add to cron (daily at 2 AM)
echo "0 2 * * * /opt/pentagi/backup-postgres.sh" | crontab -
Restore:
# Stop PentAGI
docker compose stop pentagi

# Restore backup
gunzip -c pentagi_20260220_020000.sql.gz | \
  docker compose exec -T pgvector psql -U postgres -d pentagidb

# Restart PentAGI
docker compose start pentagi

Health Checks and Auto-Restart

1

Configure Docker health checks

All PentAGI services already include health checks in docker-compose.yml:
services:
  pentagi:
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "https://localhost:8443/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 60s
2

Monitor with external watchdog

Install and configure a watchdog service:
# Install monit
sudo apt-get install monit

# Configure PentAGI monitoring
sudo cat > /etc/monit/conf.d/pentagi << 'EOF'
check host pentagi with address localhost
  if failed
    port 8443
    protocol https
    request /health
    with timeout 10 seconds
    for 3 cycles
  then exec "/usr/bin/docker compose -f /opt/pentagi/docker-compose.yml restart pentagi"
EOF

sudo systemctl restart monit

Resource Planning

Use case: 1-2 concurrent users, basic penetration testing
ComponentCPUMemoryStorage
Main Node4 vCPU8 GB100 GB SSD
Worker Node4 vCPU8 GB50 GB SSD
Total8 vCPU16 GB150 GB

Configure Resource Limits

Edit docker-compose.yml to add resource constraints:
services:
  pentagi:
    deploy:
      resources:
        limits:
          cpus: '4'
          memory: 8G
        reservations:
          cpus: '2'
          memory: 4G

  pgvector:
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 4G
        reservations:
          cpus: '1'
          memory: 2G

  scraper:
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 2G
        reservations:
          cpus: '1'
          memory: 1G

Storage Management

1

Monitor disk usage

# Check Docker disk usage
docker system df -v

# Check volume sizes
docker volume ls --format '{{.Name}}' | \
  xargs -I {} sh -c 'echo "Volume: {}"; docker volume inspect {} --format "{{.Mountpoint}}" | xargs du -sh'
2

Configure log rotation

Already configured in docker-compose.yml:
services:
  pentagi:
    logging:
      options:
        max-size: 50m
        max-file: "7"
3

Clean up old data

# Remove unused images
docker image prune -a --filter "until=168h"

# Remove unused volumes (careful!)
docker volume prune --filter "label!=keep"

# Remove old worker containers
docker ps -a | grep pentagi-terminal | \
  awk '{if (NR>10) print $1}' | xargs docker rm

Performance Optimization

Database Tuning

1

Configure PostgreSQL performance

Create postgresql.conf override:
# /opt/pentagi/postgres/postgresql.conf
shared_buffers = 2GB
effective_cache_size = 6GB
maintenance_work_mem = 512MB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 10MB
min_wal_size = 1GB
max_wal_size = 4GB
max_worker_processes = 4
max_parallel_workers_per_gather = 2
max_parallel_workers = 4
max_parallel_maintenance_workers = 2
Mount in docker-compose.yml:
services:
  pgvector:
    volumes:
      - ./postgres/postgresql.conf:/etc/postgresql/postgresql.conf:ro
    command: postgres -c config_file=/etc/postgresql/postgresql.conf
2

Optimize vector search

-- Connect to database
docker compose exec pgvector psql -U postgres -d pentagidb

-- Create indexes for vector search
CREATE INDEX ON memories USING ivfflat (embedding vector_cosine_ops)
  WITH (lists = 100);

-- Analyze tables
ANALYZE memories;

LLM Provider Optimization

Configure rate limits to prevent API quota exhaustion:
# Use Redis for rate limiting (future feature)
# Currently managed by LLM provider quotas

Network Optimization

1

Enable connection pooling

Already configured in PentAGI for database connections.
2

Use HTTP/2

Modern reverse proxies like nginx support HTTP/2:
server {
    listen 443 ssl http2;
    server_name pentagi.example.com;
    
    ssl_certificate /etc/ssl/certs/pentagi.crt;
    ssl_certificate_key /etc/ssl/private/pentagi.key;
    
    location / {
        proxy_pass https://localhost:8443;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
}

Monitoring and Alerting

Grafana Dashboard Configuration

1

Import PentAGI dashboards

Pre-configured dashboards are available in observability/grafana/dashboards/:
  • PentAGI System Overview
  • Docker Metrics
  • PostgreSQL Performance
  • LLM Usage and Costs
2

Configure alert rules

Create alert rules in Grafana:
# High CPU usage
- alert: HighCPUUsage
  expr: rate(container_cpu_usage_seconds_total{name="pentagi"}[5m]) > 0.8
  for: 5m
  annotations:
    summary: "PentAGI high CPU usage"

# High memory usage
- alert: HighMemoryUsage
  expr: container_memory_usage_bytes{name="pentagi"} / container_spec_memory_limit_bytes{name="pentagi"} > 0.9
  for: 5m
  annotations:
    summary: "PentAGI high memory usage"

# Database connection issues
- alert: DatabaseConnectionFailure
  expr: pg_up == 0
  for: 1m
  annotations:
    summary: "PostgreSQL is down"
3

Configure notification channels

Set up notifications in Grafana:
  • Email
  • Slack
  • PagerDuty
  • Webhook

Log Analysis

1

Access logs via Loki

Query logs in Grafana using LogQL:
# All PentAGI errors
{container_name="pentagi"} |= "ERROR"

# Database query performance
{container_name="pgvector"} |= "slow query"

# LLM API errors
{container_name="pentagi"} |= "OpenAI" |= "error"
2

Export logs for analysis

# Export logs to file
docker compose logs --since 24h pentagi > pentagi-$(date +%Y%m%d).log

# Search for errors
grep -i error pentagi-$(date +%Y%m%d).log

Disaster Recovery

Backup Strategy

Daily Backups

Automated daily backups of databases and configuration

Weekly Archives

Full system snapshots retained for 4 weeks

Offsite Storage

Critical backups replicated to remote storage

Point-in-Time Recovery

PostgreSQL WAL archiving for recovery to any point

Recovery Procedures

  1. Provision new infrastructure
  2. Install Docker and Docker Compose
  3. Restore .env configuration from backup
  4. Restore PostgreSQL database
  5. Restore Neo4j database (if using Graphiti)
  6. Start all services
  7. Verify functionality
  8. Update DNS if needed
  1. Stop affected services
  2. Restore database from most recent backup
  3. Replay transaction logs if available
  4. Verify data integrity
  5. Restart services
  6. Monitor for issues
  1. Generate new certificates
  2. Update .env with new certificate paths
  3. Restart PentAGI service
  4. Verify HTTPS access
  5. Update monitoring to alert 30 days before expiration

Compliance and Auditing

Audit Logging

1

Enable audit logs

PentAGI automatically logs all actions. Configure retention:
# Set log retention in days
PENTAGI_LOG_RETENTION_DAYS=90
2

Export audit logs

# Export logs for compliance
docker compose exec pgvector psql -U postgres -d pentagidb -c \
  "COPY (SELECT * FROM audit_logs WHERE created_at > NOW() - INTERVAL '30 days') \
  TO STDOUT CSV HEADER" > audit_$(date +%Y%m).csv

Data Retention Policies

Configure in .env:
# Retain task results for 90 days
TASK_RETENTION_DAYS=90

# Retain worker logs for 30 days
WORKER_LOG_RETENTION_DAYS=30

# Retain LLM traces for 60 days
LLM_TRACE_RETENTION_DAYS=60

Next Steps

Troubleshooting

Common production issues and solutions

Scaling Guide

Scale PentAGI for larger workloads

Security Hardening

Advanced security configurations

Monitoring

Deep dive into monitoring and metrics

Build docs developers (and LLMs) love