Skip to main content

Overview

PentAGI’s autonomous penetration testing system leverages cutting-edge AI agents to automatically plan, execute, and adapt security testing workflows. The system intelligently delegates tasks to specialized agents, each optimized for specific aspects of penetration testing.

Multi-Agent System

Specialized AI agents working together as a coordinated team

Autonomous Execution

Self-directed testing with minimal human intervention

Adaptive Planning

Dynamic task generation based on real-time findings

Smart Memory

Long-term learning from past tests and successes

Multi-Agent Architecture

PentAGI employs a sophisticated multi-agent system where each agent has specialized capabilities:

Primary Agent (Orchestrator)

The primary agent coordinates the entire penetration testing flow, delegating tasks to specialist agents:
// Primary executor has access to all specialized agents
PrimaryExecutor:
  - done: Complete the task
  - ask: Ask user for input (optional)
  - advice: Get guidance from mentor
  - coder: Delegate code development
  - maintenance: Delegate tool installation
  - memorist: Search long-term memory
  - pentester: Delegate security testing
  - search: Delegate research tasks

Specialist Agents

Purpose: Execute penetration testing attacks and vulnerability assessmentsCapabilities:
  • Execute commands in sandboxed environment
  • Read and write files
  • Use browser for web reconnaissance
  • Search guides and code samples
  • Delegate to coder, installer, adviser, memorist, and searcher
Tools Available:
  • terminal: Execute security tools
  • file: Manage exploit files
  • browser: Web reconnaissance
  • search_guide: Find pentesting guides
  • store_guide: Save successful techniques
  • graphiti_search: Query knowledge graph for historical context
  • Agent delegation tools
Example Use Cases:
  • Exploit development and testing
  • Vulnerability validation
  • Post-exploitation activities
Purpose: Develop custom exploits, scripts, and toolsCapabilities:
  • Write code for specific tasks
  • Search code samples in memory
  • Store reusable code snippets
  • Browse documentation
  • Delegate to installer, adviser, memorist, and searcher
Tools Available:
  • browser: Access documentation
  • search_code: Find relevant code samples
  • store_code: Save working code
  • graphiti_search: Find previous code patterns
  • Agent delegation tools
Example Use Cases:
  • Custom exploit development
  • Automation script creation
  • Payload generation
Purpose: Maintain environment and install security toolsCapabilities:
  • Execute commands in container
  • Install and configure tools
  • Manage dependencies
  • Browse installation documentation
  • Store installation guides
Tools Available:
  • terminal: Install packages and tools
  • file: Manage configuration files
  • browser: Access installation guides
  • search_guide: Find setup instructions
  • store_guide: Save installation procedures
  • Agent delegation tools
Example Use Cases:
  • Tool installation (nmap, metasploit, sqlmap)
  • Environment configuration
  • Dependency resolution
Purpose: Research and gather intelligence from multiple sourcesCapabilities:
  • Search multiple search engines
  • Browse web pages
  • Query vector database
  • Store research findings
Tools Available:
  • google: Fast queries for public links
  • duckduckgo: Anonymous searches
  • tavily: Detailed research reports
  • traversaal: Relevant web links
  • perplexity: Complex LLM-augmented research
  • searxng: Privacy-focused meta-search
  • browser: Deep web page analysis
  • search_answer: Query stored answers
  • store_answer: Save research findings
  • Delegate to memorist
Example Use Cases:
  • CVE research
  • Exploit discovery
  • Vulnerability documentation
  • Target reconnaissance
Purpose: Search and retrieve information from long-term memoryCapabilities:
  • Search vector database with semantic queries
  • Query Graphiti knowledge graph
  • Execute commands to gather context
Tools Available:
  • search_in_memory: Semantic search in vector DB
  • graphiti_search: Query knowledge graph for:
    • Temporal windows (time-bounded searches)
    • Entity relationships (graph traversal)
    • Diverse results (anti-redundancy)
    • Episode context (full agent reasoning)
    • Successful tools (proven techniques)
    • Recent context (latest findings)
    • Entity by label (type-specific searches)
  • terminal: Execute commands for context
  • file: Read historical files
Example Use Cases:
  • Retrieve past successful exploits
  • Find similar vulnerability patterns
  • Access historical test results
Purpose: Provide expert guidance and strategic adviceCapabilities:
  • Analyze complex situations
  • Suggest optimal approaches
  • Provide strategic recommendations
Example Use Cases:
  • Difficult decision-making
  • Complex attack planning
  • Troubleshooting failed attempts

Autonomous Execution

Task Execution Flow

PentAGI follows an intelligent task execution flow:

Agent Delegation

Agents can delegate subtasks to other agents:
{
  "name": "pentester",
  "message": "Test SQL injection on target login form at http://target.com/login",
  "args": {
    "task": "Exploit SQL injection vulnerability",
    "target": "http://target.com/login",
    "context": "Login form with username and password fields"
  }
}

Adaptive Planning

Dynamic Subtask Generation

PentAGI uses a Generator Agent to create adaptive task plans:
GeneratorExecutor:
  - subtask_list: Submit new subtask plan
  - memorist: Query historical data
  - search: Research similar tests
  - terminal: Gather target information
  - browser: Reconnaissance

Subtask Refinement

The Refiner Agent modifies subtasks based on execution results:
Delta Operations
{
  "operations": [
    {
      "type": "add",
      "position": 2,
      "title": "Test XSS vulnerability",
      "description": "Found potential XSS in search form"
    },
    {
      "type": "modify",
      "id": 3,
      "updates": {
        "description": "Update: SQL injection confirmed, proceed with exploitation"
      }
    },
    {
      "type": "remove",
      "id": 5,
      "reason": "Already completed in previous step"
    },
    {
      "type": "reorder",
      "id": 4,
      "new_position": 1,
      "reason": "Higher priority"
    }
  ]
}

Intelligent Memory

Vector Store Memory

Automatic storage of tool execution results:
allowedStoringInMemoryTools = [
  "terminal",      // Command outputs
  "file",          // File contents
  "search",        // Research results
  "google",        // Search results
  "duckduckgo",    // Search results
  "tavily",        // Research reports
  "traversaal",    // Web findings
  "perplexity",    // LLM-augmented research
  "searxng",       // Meta-search results
  "maintenance",   // Installation procedures
  "coder",         // Code solutions
  "pentester",     // Test results
  "advice",        // Expert guidance
]

Knowledge Graph Integration

Graphiti-powered knowledge graph captures:

Entities

  • Targets and hosts
  • Vulnerabilities
  • Tools and techniques
  • Exploits and payloads

Relationships

  • Tool → Vulnerability mappings
  • Target → Exploit connections
  • Technique → Success patterns
  • Agent → Task relationships

Episodes

  • Complete agent reasoning chains
  • Tool execution records
  • Temporal context windows
  • Success/failure patterns

Temporal Context

  • Time-bounded searches
  • Historical progression
  • Evolution of attacks
  • Learning over time
Agents can query memory with natural language:
# Search for similar vulnerabilities
search_in_memory(
  query="SQL injection techniques for login forms with prepared statements",
  filters={"doc_type": "tool_result", "tool_name": "pentester"}
)

# Search knowledge graph
graphiti_search(
  query="successful SQLi exploitation techniques",
  search_type="successful_tools",
  filters={"entity_label": "exploitation_technique"}
)

Result Summarization

Large tool outputs are automatically summarized:
allowedSummarizingToolsResult = [
  "terminal",  // Command outputs
  "browser",   // Web page contents
]
Summarization Prompt Template:
  • Preserves critical information (errors, paths, URLs, commands)
  • Maintains actionable insights
  • Structures information logically
  • Reduces size while retaining practical value (max 8KB from 16KB)

Configuration

Enable Agent Delegation

Control whether assistants use agent delegation:
.env
# Default behavior for new assistants
ASSISTANT_USE_AGENTS=false  # Toggle agent delegation
When enabled (true):
  • Assistant can delegate to specialist agents
  • Suitable for complex, multi-step testing
  • Higher LLM token usage
When disabled (false):
  • Assistant uses tools directly
  • Faster for simple tasks
  • Lower cost, direct execution
Users can override this default in the UI when creating or editing assistants using the “Use Agents” toggle.

Best Practices

  • Provide clear, specific task descriptions
  • Include relevant context (target info, discovered findings)
  • Specify expected outcomes
  • Reference previous related work
  • Query memory before starting new tasks
  • Store successful techniques and payloads
  • Use semantic search with detailed queries
  • Leverage knowledge graph for historical context
  • Let agents refine subtasks based on findings
  • Enable dynamic task prioritization
  • Allow for exploration of unexpected paths
  • Review and adjust plans as tests progress
  • Let Installer agent handle tool setup
  • Use Searcher for initial reconnaissance
  • Delegate exploit development to Coder
  • Trust Pentester for vulnerability testing

Security Tools

Explore 20+ professional pentesting tools

Reporting

Generate comprehensive vulnerability reports

Monitoring

Track agent performance and LLM metrics

Architecture

Deep dive into system architecture

Build docs developers (and LLMs) love