System Architecture

Overview

PentAGI is built on a modern microservices architecture designed for scalability, security, and observability. The system leverages Docker containers for isolation, PostgreSQL with pgvector for semantic search, and a comprehensive monitoring stack for production-grade deployments.

System Context

PentAGI operates as an autonomous penetration testing system that coordinates between security engineers, AI agents, target systems, and external services.

Container Architecture

The system is composed of multiple containerized services organized into functional groups:

Core Services

Frontend UI: React-based web interface with TypeScript for type safety, providing an intuitive dashboard for managing penetration tests, viewing results, and configuring assistants. Backend API: Go-based REST and GraphQL APIs that handle:

Flow, task, and subtask management
Agent coordination and tool execution
Real-time WebSocket communication
Authentication and authorization

Vector Store: PostgreSQL with pgvector extension for:

Semantic similarity search using vector embeddings
Memory storage and retrieval (see Memory System)
Historical context matching

Task Queue: Asynchronous task processing system for reliable operation and background job execution. AI Agents: Multi-agent system with specialized roles for efficient penetration testing (see Agent System).

Knowledge Graph

Graphiti: Knowledge graph API that provides semantic relationship tracking and contextual understanding by automatically extracting structured knowledge from agent interactions. Neo4j: Graph database storing:

Entity nodes (targets, tools, vulnerabilities, techniques)
Relationship edges (execution patterns, dependencies)
Temporal context and success patterns

See Knowledge Graph for detailed information.

Monitoring Stack

OpenTelemetry: Unified observability data collection with automatic correlation of metrics, traces, and logs. Grafana: Real-time visualization dashboards for system health, performance metrics, and security testing progress. VictoriaMetrics: High-performance time-series database optimized for long-term metrics storage. Jaeger: End-to-end distributed tracing for debugging multi-agent workflows and performance analysis. Loki: Scalable log aggregation with automatic label extraction and efficient querying.

Analytics Platform

Langfuse: Advanced LLM observability providing:

Token usage tracking and cost analysis
Prompt performance evaluation
Agent interaction visualization
Model comparison metrics

ClickHouse: Column-oriented analytics warehouse for high-speed analytical queries on large datasets. Redis: High-speed caching and rate limiting for API protection and performance optimization. MinIO: S3-compatible object storage for artifacts, reports, and large binary data.

Security Tools

Web Scraper: Isolated browser environment based on Chromium for safe web interaction during reconnaissance. Pentesting Tools: Comprehensive suite of 20+ professional security tools including:

Network scanners (nmap, masscan)
Exploitation frameworks (Metasploit)
Web application testing (sqlmap, nikto)
All executed in sandboxed Docker containers

Entity Relationship

The system uses a hierarchical data model for organizing penetration testing workflows: Flow: Top-level penetration test session representing an entire security assessment. Task: Major phase within a flow (e.g., reconnaissance, vulnerability scanning, exploitation). SubTask: Specific operation assigned to a specialized agent, containing the agent type and execution context. Action: Individual tool execution or analysis step, including command runs, searches, and data processing. Artifact: Files, reports, logs, and other outputs produced during testing. Memory: Semantic memories stored with vector embeddings for similarity search and context retrieval.

Agent Interaction Flow

Agents collaborate through a choreographed workflow with vector store and knowledge base integration:

The orchestrator maintains context across all phases, ensuring that insights from research inform development, and execution results feed back into the knowledge base for future operations.

Deployment Architecture

Single-Node Deployment

For development and small-scale testing:

All services run on a single host
Docker socket mounted for container management
Suitable for learning and proof-of-concept

Two-Node Production Deployment

For production and security-sensitive environments:

Control Node: UI, API, databases, monitoring
Worker Node: Isolated execution environment for untrusted code
Docker-in-Docker with TLS authentication
Network isolation for penetration testing operations
Dedicated port ranges for out-of-band attack techniques

See the Worker Node Setup Guide for detailed configuration.

Network Architecture

Services communicate through separate Docker networks for security boundaries: pentagi-network: Core services (API, UI, database, agents) observability-network: Monitoring stack (Grafana, Jaeger, Loki, VictoriaMetrics) langfuse-network: Analytics platform (Langfuse, ClickHouse, Redis, MinIO) graphiti-network: Knowledge graph services (Graphiti, Neo4j)

Production deployments should implement additional security measures including:

TLS certificates for all HTTPS endpoints
Strong authentication tokens and database passwords
Network policies restricting inter-service communication
Regular security updates for all containers

Scalability Considerations

Horizontal Scaling: Multiple worker nodes can be added to distribute pentesting workloads. Database Replication: PostgreSQL supports read replicas for query performance. Load Balancing: Frontend and API can run multiple instances behind a load balancer. Queue Distribution: Task queue supports multiple consumers for parallel processing. Storage Scaling: MinIO supports distributed object storage for artifact retention.

Agent System - Multi-agent coordination and specialized roles
Memory System - Long-term memory and vector storage
Knowledge Graph - Semantic relationships and contextual learning

Get Started

Core Concepts

Configuration

Deployment

Features

Development

System Architecture

Overview

System Context

Container Architecture

Core Services

Knowledge Graph

Monitoring Stack

Analytics Platform

Security Tools

Entity Relationship

Agent Interaction Flow

Deployment Architecture

Single-Node Deployment

Two-Node Production Deployment

Network Architecture

Scalability Considerations

Build docs developers (and LLMs) love

Get Started

Core Concepts

Configuration

Deployment

Features

Development

​Overview

​System Context

​Container Architecture

​Core Services

​Knowledge Graph

​Monitoring Stack

​Analytics Platform

​Security Tools

​Entity Relationship

​Agent Interaction Flow

​Deployment Architecture

​Single-Node Deployment

​Two-Node Production Deployment

​Network Architecture

​Scalability Considerations

​Related Concepts

Build docs developers (and LLMs) love

Overview

System Context

Container Architecture

Core Services

Knowledge Graph

Monitoring Stack

Analytics Platform

Security Tools

Entity Relationship

Agent Interaction Flow

Deployment Architecture

Single-Node Deployment

Two-Node Production Deployment

Network Architecture

Scalability Considerations

Related Concepts