Production Checklist
Security Hardening
Change all default passwords and secrets
Generate new API keys and tokens
Configure SSL/TLS certificates
Enable authentication for all services
High Availability
Deploy worker nodes on separate hardware
Configure backup and restore procedures
Set up database replication (if needed)
Configure health checks and auto-restart
Monitoring & Observability
Deploy full observability stack
Enable metrics collection
Resource Planning
Size compute resources appropriately
Configure resource limits
Security Hardening
Change Default Credentials
Critical: All default passwords, salts, and secrets MUST be changed before production deployment.
# PentAGI Security
COOKIE_SIGNING_SALT = $( openssl rand -hex 32 )
PUBLIC_URL = https://pentagi.example.com
CORS_ORIGINS = https://pentagi.example.com
# PostgreSQL
PENTAGI_POSTGRES_USER = pentagi_prod
PENTAGI_POSTGRES_PASSWORD = $( openssl rand -base64 32 )
PENTAGI_POSTGRES_DB = pentagidb
# Neo4j (for Graphiti)
NEO4J_USER = neo4j
NEO4J_PASSWORD = $( openssl rand -base64 32 )
# Scraper
LOCAL_SCRAPER_USERNAME = pentagi_scraper
LOCAL_SCRAPER_PASSWORD = $( openssl rand -base64 32 )
# Langfuse Security
LANGFUSE_SALT = $( openssl rand -hex 32 )
LANGFUSE_ENCRYPTION_KEY = $( openssl rand -hex 32 )
LANGFUSE_NEXTAUTH_SECRET = $( openssl rand -hex 32 )
# Langfuse Database
LANGFUSE_POSTGRES_USER = langfuse_prod
LANGFUSE_POSTGRES_PASSWORD = $( openssl rand -base64 32 )
LANGFUSE_CLICKHOUSE_USER = clickhouse
LANGFUSE_CLICKHOUSE_PASSWORD = $( openssl rand -base64 32 )
# Langfuse Storage
LANGFUSE_S3_ACCESS_KEY_ID = $( openssl rand -hex 16 )
LANGFUSE_S3_SECRET_ACCESS_KEY = $( openssl rand -hex 32 )
LANGFUSE_REDIS_AUTH = $( openssl rand -base64 32 )
# Langfuse Admin
LANGFUSE_INIT_USER_EMAIL = [email protected]
LANGFUSE_INIT_USER_PASSWORD = $( openssl rand -base64 24 )
# Langfuse API Keys
LANGFUSE_INIT_PROJECT_PUBLIC_KEY = pk-lf- $( uuidgen )
LANGFUSE_INIT_PROJECT_SECRET_KEY = sk-lf- $( uuidgen )
SSL/TLS Configuration
Obtain SSL certificates
Use Let’s Encrypt for free certificates: # Install certbot
sudo apt-get install certbot
# Obtain certificate
sudo certbot certonly --standalone \
-d pentagi.example.com \
--email [email protected] \
--agree-tos
# Certificates will be in:
# /etc/letsencrypt/live/pentagi.example.com/
Configure PentAGI to use certificates
Edit .env: PUBLIC_URL = https://pentagi.example.com
SERVER_SSL_CRT = /etc/letsencrypt/live/pentagi.example.com/fullchain.pem
SERVER_SSL_KEY = /etc/letsencrypt/live/pentagi.example.com/privkey.pem
SERVER_USE_SSL = true
Update docker-compose.yml to mount certificates: services :
pentagi :
volumes :
- /etc/letsencrypt/live/pentagi.example.com:/etc/ssl/pentagi:ro
Set up automatic renewal
# Test renewal
sudo certbot renew --dry-run
# Add cron job for auto-renewal
sudo crontab -e
# Add line:
0 0 * * 0 certbot renew --quiet && docker compose restart pentagi
Firewall Configuration
# Enable UFW
sudo ufw enable
# Allow SSH (adjust port if needed)
sudo ufw allow 22/tcp
# Allow HTTPS for PentAGI
sudo ufw allow 443/tcp
sudo ufw allow 8443/tcp
# Allow from monitoring network (adjust subnet)
sudo ufw allow from 10.0.0.0/24 to any port 3000 comment 'Grafana'
sudo ufw allow from 10.0.0.0/24 to any port 4000 comment 'Langfuse'
# Deny all other incoming by default
sudo ufw default deny incoming
sudo ufw default allow outgoing
# Enable firewall
sudo ufw reload
# Enable UFW
sudo ufw enable
# Allow SSH
sudo ufw allow 22/tcp
# Allow Docker API from main node only
sudo ufw allow from < MAIN_NODE_I P > to any port 2376 proto tcp comment 'Docker API'
sudo ufw allow from < MAIN_NODE_I P > to any port 3376 proto tcp comment 'dind API'
# Allow metrics from monitoring network
sudo ufw allow from 10.0.0.0/24 to any port 9323 comment 'Docker metrics'
sudo ufw allow from 10.0.0.0/24 to any port 9324 comment 'dind metrics'
# Allow OOB attack ports (adjust based on target networks)
sudo ufw allow 28000:30000/tcp comment 'OOB attacks'
sudo ufw allow 28000:30000/udp comment 'OOB attacks'
sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw reload
Authentication and Access Control
Enable OAuth2 authentication
Configure OAuth2 providers in .env: # Google OAuth
OAUTH_GOOGLE_CLIENT_ID = your_client_id
OAUTH_GOOGLE_CLIENT_SECRET = your_client_secret
# GitHub OAuth
OAUTH_GITHUB_CLIENT_ID = your_client_id
OAUTH_GITHUB_CLIENT_SECRET = your_client_secret
Disable default admin account
After creating OAuth users, disable the default admin account via UI.
Configure Langfuse SSO
For enterprise deployments: # Custom OAuth2 for Langfuse
LANGFUSE_AUTH_CUSTOM_CLIENT_ID = your_client_id
LANGFUSE_AUTH_CUSTOM_CLIENT_SECRET = your_client_secret
LANGFUSE_AUTH_CUSTOM_ISSUER = https://auth.example.com
LANGFUSE_AUTH_DISABLE_SIGNUP = true
High Availability
Distributed Architecture
For production deployments, use the distributed worker node architecture to isolate penetration testing workloads.
Recommended topology:
┌──────────────────┐
│ Load Balancer │
│ (nginx/HAProxy)│
└────────┬─────────┘
│
┌────┴────┐
│ │
┌───▼────┐ ┌─▼───────┐
│ Main │ │ Main │
│ Node 1 │ │ Node 2 │
└───┬────┘ └─┬───────┘
│ │
└───┬────┘
│
┌─────┴──────┐
│ │
┌─▼──────┐ ┌─▼──────┐
│Worker │ │Worker │
│Node 1 │ │Node 2 │
└────────┘ └────────┘
Database Backup and Restore
PostgreSQL Backup
Neo4j Backup
# Create backup directory
mkdir -p /opt/pentagi/backups
# Automated backup script
cat > /opt/pentagi/backup-postgres.sh << 'EOF'
#!/bin/bash
BACKUP_DIR="/opt/pentagi/backups"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
docker compose exec -T pgvector pg_dump -U postgres pentagidb | \
gzip > "${BACKUP_DIR}/pentagi_${TIMESTAMP}.sql.gz"
# Keep only last 7 days
find "${BACKUP_DIR}" -name "pentagi_*.sql.gz" -mtime +7 -delete
EOF
chmod +x /opt/pentagi/backup-postgres.sh
# Add to cron (daily at 2 AM)
echo "0 2 * * * /opt/pentagi/backup-postgres.sh" | crontab -
Restore: # Stop PentAGI
docker compose stop pentagi
# Restore backup
gunzip -c pentagi_20260220_020000.sql.gz | \
docker compose exec -T pgvector psql -U postgres -d pentagidb
# Restart PentAGI
docker compose start pentagi
# Backup Neo4j database
docker compose exec neo4j neo4j-admin database dump neo4j \
--to-path=/backups/neo4j_$( date +%Y%m%d_%H%M%S ).dump
# Copy from container
docker cp neo4j:/backups /opt/pentagi/backups/neo4j/
Restore: # Stop Graphiti services
docker compose -f docker-compose-graphiti.yml stop
# Restore database
docker compose exec neo4j neo4j-admin database load neo4j \
--from-path=/backups/neo4j_20260220_020000.dump
# Restart services
docker compose -f docker-compose-graphiti.yml start
Health Checks and Auto-Restart
Configure Docker health checks
All PentAGI services already include health checks in docker-compose.yml: services :
pentagi :
restart : unless-stopped
healthcheck :
test : [ "CMD" , "curl" , "-f" , "https://localhost:8443/health" ]
interval : 30s
timeout : 10s
retries : 3
start_period : 60s
Monitor with external watchdog
Install and configure a watchdog service: # Install monit
sudo apt-get install monit
# Configure PentAGI monitoring
sudo cat > /etc/monit/conf.d/pentagi << 'EOF'
check host pentagi with address localhost
if failed
port 8443
protocol https
request /health
with timeout 10 seconds
for 3 cycles
then exec "/usr/bin/docker compose -f /opt/pentagi/docker-compose.yml restart pentagi"
EOF
sudo systemctl restart monit
Resource Planning
Recommended Resources
Small Deployment
Medium Deployment
Large Deployment
Use case: 1-2 concurrent users, basic penetration testingComponent CPU Memory Storage Main Node 4 vCPU 8 GB 100 GB SSD Worker Node 4 vCPU 8 GB 50 GB SSD Total 8 vCPU 16 GB 150 GB
Use case: 5-10 concurrent users, moderate workloadComponent CPU Memory Storage Main Node 8 vCPU 16 GB 250 GB SSD Worker Node 1 8 vCPU 16 GB 100 GB SSD Worker Node 2 8 vCPU 16 GB 100 GB SSD Total 24 vCPU 48 GB 450 GB
Use case: 20+ concurrent users, heavy workloadComponent CPU Memory Storage Main Node 1 16 vCPU 32 GB 500 GB SSD Main Node 2 16 vCPU 32 GB 500 GB SSD Worker Node 1 16 vCPU 32 GB 200 GB SSD Worker Node 2 16 vCPU 32 GB 200 GB SSD Worker Node 3 16 vCPU 32 GB 200 GB SSD Total 80 vCPU 160 GB 1.6 TB
Edit docker-compose.yml to add resource constraints:
services :
pentagi :
deploy :
resources :
limits :
cpus : '4'
memory : 8G
reservations :
cpus : '2'
memory : 4G
pgvector :
deploy :
resources :
limits :
cpus : '2'
memory : 4G
reservations :
cpus : '1'
memory : 2G
scraper :
deploy :
resources :
limits :
cpus : '2'
memory : 2G
reservations :
cpus : '1'
memory : 1G
Storage Management
Monitor disk usage
# Check Docker disk usage
docker system df -v
# Check volume sizes
docker volume ls --format '{{.Name}}' | \
xargs -I {} sh -c 'echo "Volume: {}"; docker volume inspect {} --format "{{.Mountpoint}}" | xargs du -sh'
Configure log rotation
Already configured in docker-compose.yml: services :
pentagi :
logging :
options :
max-size : 50m
max-file : "7"
Clean up old data
# Remove unused images
docker image prune -a --filter "until=168h"
# Remove unused volumes (careful!)
docker volume prune --filter "label!=keep"
# Remove old worker containers
docker ps -a | grep pentagi-terminal | \
awk '{if (NR>10) print $1}' | xargs docker rm
Database Tuning
Configure PostgreSQL performance
Create postgresql.conf override: # /opt/pentagi/postgres/postgresql.conf
shared_buffers = 2GB
effective_cache_size = 6GB
maintenance_work_mem = 512MB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 10MB
min_wal_size = 1GB
max_wal_size = 4GB
max_worker_processes = 4
max_parallel_workers_per_gather = 2
max_parallel_workers = 4
max_parallel_maintenance_workers = 2
Mount in docker-compose.yml: services :
pgvector :
volumes :
- ./postgres/postgresql.conf:/etc/postgresql/postgresql.conf:ro
command : postgres -c config_file=/etc/postgresql/postgresql.conf
Optimize vector search
-- Connect to database
docker compose exec pgvector psql - U postgres - d pentagidb
-- Create indexes for vector search
CREATE INDEX ON memories USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100 );
-- Analyze tables
ANALYZE memories;
LLM Provider Optimization
Rate Limiting
Caching
Model Selection
Configure rate limits to prevent API quota exhaustion: # Use Redis for rate limiting (future feature)
# Currently managed by LLM provider quotas
Enable response caching to reduce API costs: # Future feature: LLM response caching
# Will use Redis for cache storage
Choose cost-effective models for different agent types: # example.custom.provider.yml
agents :
researcher :
model : gpt-4o-mini # Cheaper model for research
developer :
model : gpt-4o # Better model for code
executor :
model : gpt-4o-mini # Cheaper for execution
Network Optimization
Enable connection pooling
Already configured in PentAGI for database connections.
Use HTTP/2
Modern reverse proxies like nginx support HTTP/2: server {
listen 443 ssl http2;
server_name pentagi.example.com;
ssl_certificate /etc/ssl/certs/pentagi.crt;
ssl_certificate_key /etc/ssl/private/pentagi.key;
location / {
proxy_pass https://localhost:8443;
proxy_http_version 1.1 ;
proxy_set_header Upgrade $ http_upgrade ;
proxy_set_header Connection "upgrade" ;
}
}
Monitoring and Alerting
Grafana Dashboard Configuration
Import PentAGI dashboards
Pre-configured dashboards are available in observability/grafana/dashboards/:
PentAGI System Overview
Docker Metrics
PostgreSQL Performance
LLM Usage and Costs
Configure alert rules
Create alert rules in Grafana: # High CPU usage
- alert : HighCPUUsage
expr : rate(container_cpu_usage_seconds_total{name="pentagi"}[5m]) > 0.8
for : 5m
annotations :
summary : "PentAGI high CPU usage"
# High memory usage
- alert : HighMemoryUsage
expr : container_memory_usage_bytes{name="pentagi"} / container_spec_memory_limit_bytes{name="pentagi"} > 0.9
for : 5m
annotations :
summary : "PentAGI high memory usage"
# Database connection issues
- alert : DatabaseConnectionFailure
expr : pg_up == 0
for : 1m
annotations :
summary : "PostgreSQL is down"
Configure notification channels
Set up notifications in Grafana:
Email
Slack
PagerDuty
Webhook
Log Analysis
Access logs via Loki
Query logs in Grafana using LogQL: # All PentAGI errors
{container_name="pentagi"} |= "ERROR"
# Database query performance
{container_name="pgvector"} |= "slow query"
# LLM API errors
{container_name="pentagi"} |= "OpenAI" |= "error"
Export logs for analysis
# Export logs to file
docker compose logs --since 24h pentagi > pentagi- $( date +%Y%m%d ) .log
# Search for errors
grep -i error pentagi- $( date +%Y%m%d ) .log
Disaster Recovery
Backup Strategy
Daily Backups Automated daily backups of databases and configuration
Weekly Archives Full system snapshots retained for 4 weeks
Offsite Storage Critical backups replicated to remote storage
Point-in-Time Recovery PostgreSQL WAL archiving for recovery to any point
Recovery Procedures
Provision new infrastructure
Install Docker and Docker Compose
Restore .env configuration from backup
Restore PostgreSQL database
Restore Neo4j database (if using Graphiti)
Start all services
Verify functionality
Update DNS if needed
Stop affected services
Restore database from most recent backup
Replay transaction logs if available
Verify data integrity
Restart services
Monitor for issues
Generate new certificates
Update .env with new certificate paths
Restart PentAGI service
Verify HTTPS access
Update monitoring to alert 30 days before expiration
Compliance and Auditing
Audit Logging
Enable audit logs
PentAGI automatically logs all actions. Configure retention: # Set log retention in days
PENTAGI_LOG_RETENTION_DAYS = 90
Export audit logs
# Export logs for compliance
docker compose exec pgvector psql -U postgres -d pentagidb -c \
"COPY (SELECT * FROM audit_logs WHERE created_at > NOW() - INTERVAL '30 days') \
TO STDOUT CSV HEADER" > audit_ $( date +%Y%m ) .csv
Data Retention Policies
Configure in .env:
# Retain task results for 90 days
TASK_RETENTION_DAYS = 90
# Retain worker logs for 30 days
WORKER_LOG_RETENTION_DAYS = 30
# Retain LLM traces for 60 days
LLM_TRACE_RETENTION_DAYS = 60
Next Steps
Troubleshooting Common production issues and solutions
Scaling Guide Scale PentAGI for larger workloads
Security Hardening Advanced security configurations
Monitoring Deep dive into monitoring and metrics