Skip to main content

Overview

PentAGI employs a sophisticated multi-agent architecture where specialized AI agents collaborate to conduct penetration tests. Each agent has distinct capabilities, tool access, and reasoning patterns optimized for specific phases of security testing.
The multi-agent system is optional and can be disabled per assistant. When disabled, a single agent handles all operations directly.

Agent Roles

The system features specialized agents that work together in a coordinated workflow:

Researcher Agent

Purpose: Information gathering, reconnaissance, and vulnerability analysis Capabilities:
  • Web intelligence gathering through integrated browser and search APIs
  • Target enumeration and service discovery
  • Vulnerability database queries
  • OSINT (Open Source Intelligence) collection
  • Security advisory research
Tools:
  • Search engines (Tavily, Traversaal, Perplexity, DuckDuckGo, Google, Searxng)
  • Web scraper with isolated browser
  • Memory search for historical reconnaissance data
  • Knowledge graph queries for similar targets
Reasoning Pattern: Broad exploratory analysis focusing on gathering comprehensive information about targets before exploitation attempts.

Developer Agent

Purpose: Attack planning, payload development, and exploit adaptation Capabilities:
  • Exploit development and customization
  • Attack chain planning
  • Tool selection and configuration
  • Payload crafting for specific vulnerabilities
  • Technique adaptation based on target environment
Tools:
  • Memory search for successful exploit patterns
  • Knowledge graph queries for attack relationships
  • Access to exploit databases and tool documentation
  • Code generation capabilities for custom exploits
Reasoning Pattern: Strategic planning with emphasis on creating targeted, effective attack approaches based on researcher findings.

Executor Agent

Purpose: Command execution, tool operation, and result validation Capabilities:
  • Security tool execution (nmap, metasploit, sqlmap, etc.)
  • Command-line operations in sandboxed environment
  • Output analysis and validation
  • Result documentation
  • Error handling and retry logic
Tools:
  • 20+ professional pentesting tools in sandboxed containers
  • Shell access for custom commands
  • Memory storage for execution results
  • Knowledge graph updates with findings
Reasoning Pattern: Precise execution with focus on command accuracy, output interpretation, and failure recovery.

Agent Coordination

Agents communicate through a structured delegation system:

Delegation Process

1. Task Analysis: The orchestrator analyzes the user request and current context to determine which agent is most appropriate. 2. Context Preparation: Relevant information from memory and knowledge graph is assembled for the specialized agent. 3. Agent Invocation: The selected agent receives:
  • Specific task description
  • Available tools and their schemas
  • Historical context from similar operations
  • Constraints and safety parameters
4. Execution: The agent performs its specialized function using available tools and reasoning capabilities. 5. Result Integration: Outputs are stored in memory/knowledge graph and returned to the orchestrator. 6. Continuation: The orchestrator decides whether to delegate further tasks or synthesize results.

Agent Configuration

Each agent can be configured with different LLM models optimized for their specific roles:
# Example agent configuration from provider YAML
researcher:
  model: "gpt-4.1"
  temperature: 0.7
  max_tokens: 4096
  reasoning: "extended"  # For complex research analysis

developer:
  model: "claude-sonnet-4"
  temperature: 0.3
  max_tokens: 8192
  reasoning: "standard"  # For strategic planning

executor:
  model: "gpt-4.1-mini"
  temperature: 0.1
  max_tokens: 2048
  reasoning: "none"  # Fast, deterministic execution

Model Selection Considerations

Researcher: Benefits from models with strong reasoning capabilities and broad knowledge (e.g., Claude Sonnet, GPT-4.1, Gemini Pro). Developer: Requires models with excellent code generation and strategic thinking (e.g., Claude Sonnet, DeepSeek Coder, GPT-4.1). Executor: Optimized for speed and accuracy; smaller models often sufficient (e.g., GPT-4.1-mini, Claude Haiku, Gemini Flash).

Tool Access Control

Agents have different tool permissions based on their roles:
{
  "search_web": "Search the internet for information",
  "search_memory": "Query historical research findings",
  "search_knowledge_graph": "Find related vulnerabilities and targets",
  "browse_url": "Visit and analyze web pages",
  "analyze_service": "Examine service configurations"
}

Agent Communication Patterns

Sequential Delegation

Most common pattern where agents work in sequence:
  1. Researcher gathers target information
  2. Developer creates attack plan based on findings
  3. Executor runs the planned attacks
  4. Orchestrator synthesizes results

Iterative Refinement

Agents may be called multiple times with updated context:

Parallel Investigation

For complex targets, multiple researchers may investigate different aspects simultaneously (planned feature).

Memory and Context Management

Agent-Specific Memory

Each agent type can filter memory searches to retrieve relevant past experiences:
// From memory.go implementation
filters := map[string]any{
    "flow_id":  flowID,
    "doc_type": "memory",
    "task_id":  taskID,      // Optional: specific task context
    "subtask_id": subtaskID, // Optional: specific subtask context
}

docs, err := store.SimilaritySearch(
    ctx,
    query,
    maxResults,
    vectorstores.WithScoreThreshold(0.2),
    vectorstores.WithFilters(filters),
)

Context Windows

Agents have different context window requirements:
  • Researcher: Large context (64K-100K tokens) for comprehensive analysis
  • Developer: Medium context (32K-64K tokens) for code and plans
  • Executor: Smaller context (8K-32K tokens) for focused operations

Chain Summarization

To manage growing conversation histories, PentAGI implements intelligent summarization:
  • Preserves recent messages in full detail
  • Summarizes older messages while maintaining critical information
  • Keeps tool calls and responses intact for debugging
  • Configurable thresholds per agent type
See Memory System for more details.

Agent Performance Optimization

Tool Call ID Detection

The system automatically detects LLM provider-specific tool call ID patterns:
// From agents.go implementation
template, err := DetermineToolCallIDTemplate(
    ctx, provider, agentType, prompter,
)
// Examples:
// OpenAI: "call_{r:24:x}"
// Anthropic: "toolu_{r:24:b}"
// Gemini: "{f}:{r:1:d}"
This enables proper function call tracking and response correlation.

Retry Logic

Agents implement sophisticated retry mechanisms:
  • Function call failures trigger retries with error context
  • Model timeouts result in fallback models
  • Invalid responses prompt self-correction
  • Configurable max retries per operation

Barrier Functions

Certain tools act as “barrier functions” that require human approval:
  • Destructive operations (data deletion, service shutdown)
  • Potentially illegal actions (without explicit authorization)
  • Operations affecting production systems

Enabling/Disabling Agent Delegation

Agent delegation can be controlled at multiple levels:

Global Default

# In .env file
ASSISTANT_USE_AGENTS=false  # Disable by default
ASSISTANT_USE_AGENTS=true   # Enable by default

Per-Assistant Configuration

Users can toggle agent delegation in the UI when creating or editing assistants, overriding the global default.

When to Use Single vs Multi-Agent

Single Agent Mode (Faster):
  • Simple, straightforward tasks
  • Speed is critical
  • Minimal context switching needed
  • Direct command execution
Multi-Agent Mode (More Capable):
  • Complex penetration testing scenarios
  • Research-heavy operations
  • Multi-phase attack planning
  • Learning from diverse past experiences

Build docs developers (and LLMs) love