Autonomous Penetration Testing

Overview

PentAGI’s autonomous penetration testing system leverages cutting-edge AI agents to automatically plan, execute, and adapt security testing workflows. The system intelligently delegates tasks to specialized agents, each optimized for specific aspects of penetration testing.

Multi-Agent System

Specialized AI agents working together as a coordinated team

Autonomous Execution

Self-directed testing with minimal human intervention

Adaptive Planning

Dynamic task generation based on real-time findings

Smart Memory

Long-term learning from past tests and successes

Multi-Agent Architecture

PentAGI employs a sophisticated multi-agent system where each agent has specialized capabilities:

Primary Agent (Orchestrator)

The primary agent coordinates the entire penetration testing flow, delegating tasks to specialist agents:

// Primary executor has access to all specialized agents
PrimaryExecutor:
  - done: Complete the task
  - ask: Ask user for input (optional)
  - advice: Get guidance from mentor
  - coder: Delegate code development
  - maintenance: Delegate tool installation
  - memorist: Search long-term memory
  - pentester: Delegate security testing
  - search: Delegate research tasks

Specialist Agents

Pentester Agent

Purpose: Execute penetration testing attacks and vulnerability assessmentsCapabilities:

Execute commands in sandboxed environment
Read and write files
Use browser for web reconnaissance
Search guides and code samples
Delegate to coder, installer, adviser, memorist, and searcher

Tools Available:

terminal: Execute security tools
file: Manage exploit files
browser: Web reconnaissance
search_guide: Find pentesting guides
store_guide: Save successful techniques
graphiti_search: Query knowledge graph for historical context
Agent delegation tools

Example Use Cases:

Exploit development and testing
Vulnerability validation
Post-exploitation activities

Coder Agent

Purpose: Develop custom exploits, scripts, and toolsCapabilities:

Write code for specific tasks
Search code samples in memory
Store reusable code snippets
Browse documentation
Delegate to installer, adviser, memorist, and searcher

Tools Available:

browser: Access documentation
search_code: Find relevant code samples
store_code: Save working code
graphiti_search: Find previous code patterns
Agent delegation tools

Example Use Cases:

Custom exploit development
Automation script creation
Payload generation

Installer Agent (Maintenance)

Purpose: Maintain environment and install security toolsCapabilities:

Execute commands in container
Install and configure tools
Manage dependencies
Browse installation documentation
Store installation guides

Tools Available:

terminal: Install packages and tools
file: Manage configuration files
browser: Access installation guides
search_guide: Find setup instructions
store_guide: Save installation procedures
Agent delegation tools

Example Use Cases:

Tool installation (nmap, metasploit, sqlmap)
Environment configuration
Dependency resolution

Searcher Agent

Purpose: Research and gather intelligence from multiple sourcesCapabilities:

Search multiple search engines
Browse web pages
Query vector database
Store research findings

Tools Available:

google: Fast queries for public links
duckduckgo: Anonymous searches
tavily: Detailed research reports
traversaal: Relevant web links
perplexity: Complex LLM-augmented research
searxng: Privacy-focused meta-search
browser: Deep web page analysis
search_answer: Query stored answers
store_answer: Save research findings
Delegate to memorist

Example Use Cases:

CVE research
Exploit discovery
Vulnerability documentation
Target reconnaissance

Memorist Agent (Archivist)

Purpose: Search and retrieve information from long-term memoryCapabilities:

Search vector database with semantic queries
Query Graphiti knowledge graph
Execute commands to gather context

Tools Available:

search_in_memory: Semantic search in vector DB
graphiti_search: Query knowledge graph for:
- Temporal windows (time-bounded searches)
- Entity relationships (graph traversal)
- Diverse results (anti-redundancy)
- Episode context (full agent reasoning)
- Successful tools (proven techniques)
- Recent context (latest findings)
- Entity by label (type-specific searches)
terminal: Execute commands for context
file: Read historical files

Example Use Cases:

Retrieve past successful exploits
Find similar vulnerability patterns
Access historical test results

Adviser Agent (Mentor)

Purpose: Provide expert guidance and strategic adviceCapabilities:

Analyze complex situations
Suggest optimal approaches
Provide strategic recommendations

Example Use Cases:

Difficult decision-making
Complex attack planning
Troubleshooting failed attempts

Autonomous Execution

Task Execution Flow

PentAGI follows an intelligent task execution flow:

Agent Delegation

Agents can delegate subtasks to other agents:

{
  "name": "pentester",
  "message": "Test SQL injection on target login form at http://target.com/login",
  "args": {
    "task": "Exploit SQL injection vulnerability",
    "target": "http://target.com/login",
    "context": "Login form with username and password fields"
  }
}

Adaptive Planning

Dynamic Subtask Generation

PentAGI uses a Generator Agent to create adaptive task plans:

GeneratorExecutor:
  - subtask_list: Submit new subtask plan
  - memorist: Query historical data
  - search: Research similar tests
  - terminal: Gather target information
  - browser: Reconnaissance

The Refiner Agent modifies subtasks based on execution results:

Delta Operations

{
  "operations": [
    {
      "type": "add",
      "position": 2,
      "title": "Test XSS vulnerability",
      "description": "Found potential XSS in search form"
    },
    {
      "type": "modify",
      "id": 3,
      "updates": {
        "description": "Update: SQL injection confirmed, proceed with exploitation"
      }
    },
    {
      "type": "remove",
      "id": 5,
      "reason": "Already completed in previous step"
    },
    {
      "type": "reorder",
      "id": 4,
      "new_position": 1,
      "reason": "Higher priority"
    }
  ]
}

Intelligent Memory

Vector Store Memory

Automatic storage of tool execution results:

allowedStoringInMemoryTools = [
  "terminal",      // Command outputs
  "file",          // File contents
  "search",        // Research results
  "google",        // Search results
  "duckduckgo",    // Search results
  "tavily",        // Research reports
  "traversaal",    // Web findings
  "perplexity",    // LLM-augmented research
  "searxng",       // Meta-search results
  "maintenance",   // Installation procedures
  "coder",         // Code solutions
  "pentester",     // Test results
  "advice",        // Expert guidance
]

Knowledge Graph Integration

Graphiti-powered knowledge graph captures:

Entities

Targets and hosts
Vulnerabilities
Tools and techniques
Exploits and payloads

Relationships

Tool → Vulnerability mappings
Target → Exploit connections
Technique → Success patterns
Agent → Task relationships

Episodes

Complete agent reasoning chains
Tool execution records
Temporal context windows
Success/failure patterns

Temporal Context

Time-bounded searches
Historical progression
Evolution of attacks
Learning over time

Semantic Search

Agents can query memory with natural language:

# Search for similar vulnerabilities
search_in_memory(
  query="SQL injection techniques for login forms with prepared statements",
  filters={"doc_type": "tool_result", "tool_name": "pentester"}
)

# Search knowledge graph
graphiti_search(
  query="successful SQLi exploitation techniques",
  search_type="successful_tools",
  filters={"entity_label": "exploitation_technique"}
)

Result Summarization

Large tool outputs are automatically summarized:

allowedSummarizingToolsResult = [
  "terminal",  // Command outputs
  "browser",   // Web page contents
]

Summarization Prompt Template:

Preserves critical information (errors, paths, URLs, commands)
Maintains actionable insights
Structures information logically
Reduces size while retaining practical value (max 8KB from 16KB)

Configuration

Enable Agent Delegation

Control whether assistants use agent delegation:

.env

# Default behavior for new assistants
ASSISTANT_USE_AGENTS=false  # Toggle agent delegation

When enabled (true):

Assistant can delegate to specialist agents
Suitable for complex, multi-step testing
Higher LLM token usage

When disabled (false):

Assistant uses tools directly
Faster for simple tasks
Lower cost, direct execution

Users can override this default in the UI when creating or editing assistants using the “Use Agents” toggle.

Best Practices

Effective Task Delegation

Provide clear, specific task descriptions
Include relevant context (target info, discovered findings)
Specify expected outcomes
Reference previous related work

Memory Utilization

Query memory before starting new tasks
Store successful techniques and payloads
Use semantic search with detailed queries
Leverage knowledge graph for historical context

Adaptive Workflows

Let agents refine subtasks based on findings
Enable dynamic task prioritization
Allow for exploration of unexpected paths
Review and adjust plans as tests progress

Tool Selection

Let Installer agent handle tool setup
Use Searcher for initial reconnaissance
Delegate exploit development to Coder
Trust Pentester for vulnerability testing

Security Tools

Explore 20+ professional pentesting tools

Reporting

Generate comprehensive vulnerability reports

Monitoring

Track agent performance and LLM metrics

Architecture

Deep dive into system architecture

Get Started

Core Concepts

Configuration

Deployment

Features

Development

Autonomous Penetration Testing

Overview

Multi-Agent System

Autonomous Execution

Adaptive Planning

Smart Memory

Multi-Agent Architecture

Primary Agent (Orchestrator)

Specialist Agents

Autonomous Execution

Task Execution Flow

Agent Delegation

Adaptive Planning

Dynamic Subtask Generation

Subtask Refinement

Intelligent Memory

Vector Store Memory

Knowledge Graph Integration

Entities

Relationships

Episodes

Temporal Context

Semantic Search

Result Summarization

Configuration

Enable Agent Delegation

Best Practices

Security Tools

Reporting

Monitoring

Architecture

Build docs developers (and LLMs) love

Get Started

Core Concepts

Configuration

Deployment

Features

Development

​Overview

Multi-Agent System

Autonomous Execution

Adaptive Planning

Smart Memory

​Multi-Agent Architecture

​Primary Agent (Orchestrator)

​Specialist Agents

​Autonomous Execution

​Task Execution Flow

​Agent Delegation

​Adaptive Planning

​Dynamic Subtask Generation

​Subtask Refinement

​Intelligent Memory

​Vector Store Memory

​Knowledge Graph Integration

Entities

Relationships

Episodes

Temporal Context

​Semantic Search

​Result Summarization

​Configuration

​Enable Agent Delegation

​Best Practices

​Related Resources

Security Tools

Reporting

Monitoring

Architecture

Build docs developers (and LLMs) love

Overview

Multi-Agent Architecture

Primary Agent (Orchestrator)

Specialist Agents

Autonomous Execution

Task Execution Flow

Agent Delegation

Adaptive Planning

Dynamic Subtask Generation

Subtask Refinement

Intelligent Memory

Vector Store Memory

Knowledge Graph Integration

Semantic Search

Result Summarization

Configuration

Enable Agent Delegation

Best Practices

Related Resources