Skip to main content

Overview

PentAGI allows you to create custom AI assistants tailored to specific penetration testing scenarios. Each assistant can be configured with different LLM models, agent behaviors, and specialized capabilities.
Custom assistants give you fine-grained control over AI behavior, model selection, and agent delegation for different testing scenarios.

Assistant Architecture

PentAGI uses a multi-agent system where different agents handle different aspects of penetration testing:

Creating Your First Assistant

1

Access Assistants page

Navigate to Settings → Assistants in the PentAGI web interface.
2

Click New Assistant

Click the + New Assistant button to open the assistant creation form.
3

Configure basic settings

Enter the assistant details:
  • Name: Descriptive name (e.g., “Web App Specialist”)
  • Description: Purpose and capabilities
  • Use Agents: Toggle to enable/disable agent delegation
4

Select LLM provider

Choose from configured providers:
  • OpenAI (GPT-4.1, o-series)
  • Anthropic (Claude 4, Claude 3.7)
  • Google Gemini (2.5 series)
  • AWS Bedrock (multi-provider)
  • Ollama (local models)
  • Custom (OpenAI-compatible APIs)
5

Save and test

Save the assistant and test it with a simple query to verify configuration.

Agent Delegation

Agent delegation allows the primary assistant to distribute work among specialized sub-agents.

When to Use Agents

Best for:
  • Complex, multi-step penetration tests
  • Tasks requiring different specialized skills
  • Long-running engagements
  • Scenarios needing research + execution
Benefits:
  • Better task decomposition
  • Specialized expertise per task
  • Parallel execution capability
  • Improved context management
The default behavior is controlled by the ASSISTANT_USE_AGENTS environment variable, but can be toggled per-assistant in the UI.

Provider-Specific Configuration

Different LLM providers have unique capabilities and configuration options.

Using Provider Config Files

For advanced control, create custom provider configuration files in YAML format:
# Simple tasks - fast, cost-effective
simple:
  model: "gpt-4.1-mini"
  temperature: 0.5
  top_p: 0.5
  n: 1
  max_tokens: 3000
  price:
    input: 0.4
    output: 1.6

# Primary reasoning agent
primary_agent:
  model: "o3-mini"
  n: 1
  max_tokens: 4000
  reasoning:
    effort: low
  price:
    input: 1.1
    output: 4.4

# Complex assistant tasks
assistant:
  model: "o3-mini"
  n: 1
  max_tokens: 6000
  reasoning:
    effort: medium
  price:
    input: 1.1
    output: 4.4

# Code generation and exploit development
coder:
  model: "gpt-4.1"
  temperature: 0.2
  top_p: 0.1
  n: 1
  max_tokens: 6000
  price:
    input: 2.0
    output: 8.0

# Penetration testing agent
pentester:
  model: "o3-mini"
  n: 1
  max_tokens: 4000
  reasoning:
    effort: low
  price:
    input: 1.1
    output: 4.4

Configuring Provider Paths

Set provider configuration in environment variables:
.env
LLM_SERVER_URL=https://api.openai.com/v1
LLM_SERVER_KEY=your_api_key
LLM_SERVER_MODEL=gpt-4.1-mini
LLM_SERVER_CONFIG_PATH=/path/to/custom-openai.provider.yml

Agent Types and Roles

Understand the different agent types and their specializations:

Simple Agent

Purpose: Fast, lightweight tasks requiring minimal reasoningTypical tasks:
  • JSON parsing and formatting
  • Quick data lookups
  • Simple transformations
  • Status checks
Recommended models:
  • GPT-4.1-mini (OpenAI)
  • Claude 3.5 Haiku (Anthropic)
  • Gemini 2.5 Flash (Google)
  • Llama 3.1 8B (Ollama)
Configuration tips:
  • Keep max_tokens low (2000-3000)
  • Use moderate temperature (0.5-0.7)
  • Prioritize speed over reasoning depth

Primary Agent

Purpose: Main orchestration and decision-makingTypical tasks:
  • Task planning and decomposition
  • Agent delegation decisions
  • Result synthesis
  • Strategic thinking
Recommended models:
  • o3-mini, o4-mini (OpenAI reasoning)
  • Claude Sonnet 4 (Anthropic)
  • Gemini 2.5 Pro Thinking (Google)
  • Qwen3 32B (Ollama)
Configuration tips:
  • Enable reasoning for complex decisions
  • Use lower temperature (0.2-0.3) for consistency
  • Allow moderate max_tokens (4000-6000)

Assistant Agent

Purpose: Specialized sub-agent for delegated tasksTypical tasks:
  • Focused research
  • Specific vulnerability testing
  • Tool execution planning
  • Result analysis
Recommended models:
  • o3-mini with medium reasoning (OpenAI)
  • Claude 3.7 Extended Thinking (Anthropic)
  • Gemini 3 Flash Preview (Google)
  • QwQ 32B (Ollama reasoning)
Configuration tips:
  • Higher max_tokens (6000-8000) for detailed work
  • Medium reasoning effort for balanced performance
  • Adjust temperature based on task creativity needs

Researcher Agent

Purpose: Information gathering and reconnaissanceTypical tasks:
  • Target enumeration
  • Technology stack identification
  • Vulnerability research
  • OSINT gathering
Recommended models:
  • GPT-4.1-mini (OpenAI)
  • Claude Haiku 4.5 (Anthropic)
  • Gemini 2.5 Flash (Google)
Configuration tips:
  • Higher temperature (0.7-0.8) for exploration
  • Moderate max_tokens (4000)
  • Enable web search capabilities

Developer/Coder Agent

Purpose: Exploit development and payload creationTypical tasks:
  • Writing exploit code
  • Creating custom payloads
  • Tool script generation
  • Bypass technique development
Recommended models:
  • GPT-4.1 (OpenAI)
  • Claude Sonnet 4 (Anthropic)
  • Gemini 2.5 Pro (Google)
  • Qwen3 32B (Ollama)
Configuration tips:
  • Low temperature (0.2-0.3) for precision
  • Low top_p (0.1-0.3) for deterministic output
  • Higher max_tokens (6000-8000) for complete code

Pentester Agent

Purpose: Active penetration testing and exploitationTypical tasks:
  • Running security tools
  • Executing exploits
  • Vulnerability validation
  • Post-exploitation activities
Recommended models:
  • o3-mini with low reasoning (OpenAI)
  • Claude Sonnet 4 (Anthropic)
  • Gemini 2.5 Flash Thinking (Google)
Configuration tips:
  • Moderate max_tokens (4000)
  • Low reasoning effort for faster execution
  • Balance between speed and accuracy

Advanced Configuration Examples

Multi-Model Strategy

Use different models for different agents to optimize cost and performance:
# Fast tasks - OpenAI mini
simple:
  model: "gpt-4.1-mini"
  temperature: 0.5
  max_tokens: 3000

# Research - OpenAI mini
searcher:
  model: "gpt-4.1-mini"
  temperature: 0.7
  max_tokens: 4000

# Strategic planning - Claude Sonnet (via Bedrock)
primary_agent:
  model: "anthropic.claude-sonnet-4"
  temperature: 1.0
  max_tokens: 4000

# Complex reasoning - Claude with thinking
assistant:
  model: "anthropic.claude-3-7-sonnet"
  temperature: 1.0
  max_tokens: 6000
  reasoning:
    max_tokens: 2048

# Code generation - GPT-4.1
coder:
  model: "gpt-4.1"
  temperature: 0.2
  max_tokens: 6000

# Active testing - OpenAI reasoning
pentester:
  model: "o3-mini"
  max_tokens: 4000
  reasoning:
    effort: low

Creating Extended Context Ollama Models

PentAGI requires models with larger context windows (110K tokens) for complex penetration testing scenarios.
The num_ctx parameter can only be set during model creation via Modelfile - it cannot be changed after creation or overridden at runtime.

Qwen3 32B with Extended Context

1

Create Modelfile

Create a file named Modelfile_qwen3_32b_fp16_tc:
FROM qwen3:32b-fp16
PARAMETER num_ctx 110000
PARAMETER temperature 0.3
PARAMETER top_p 0.8
PARAMETER min_p 0.0
PARAMETER top_k 20
PARAMETER repeat_penalty 1.1
2

Build custom model

ollama create qwen3:32b-fp16-tc -f Modelfile_qwen3_32b_fp16_tc
3

Configure in PentAGI

.env
OLLAMA_SERVER_MODEL=qwen3:32b-fp16-tc
OLLAMA_SERVER_CONFIG_PATH=/opt/pentagi/conf/ollama-qwen332b-fp16-tc.provider.yml

QwQ 32B Reasoning Model

1

Create Modelfile

Create Modelfile_qwq_32b_fp16_tc:
FROM qwq:32b-fp16
PARAMETER num_ctx 110000
PARAMETER temperature 0.2
PARAMETER top_p 0.7
PARAMETER min_p 0.0
PARAMETER top_k 40
PARAMETER repeat_penalty 1.2
2

Build model

ollama create qwq:32b-fp16-tc -f Modelfile_qwq_32b_fp16_tc
QwQ 32B FP16 requires approximately 71.3 GB VRAM. Ensure your system has sufficient GPU memory.
3

Configure for reasoning tasks

.env
OLLAMA_SERVER_MODEL=qwq:32b-fp16-tc
OLLAMA_SERVER_CONFIG_PATH=/opt/pentagi/conf/ollama-qwq32b-fp16-tc.provider.yml

Assistant Specialization Examples

Web Application Specialist

name: "Web App Specialist"
description: "Specialized in web application vulnerability assessment"
use_agents: true

prompt_template: |
  You are a web application security specialist. Your expertise includes:
  - OWASP Top 10 vulnerabilities
  - SQL injection and XSS
  - Authentication and session management
  - API security testing
  
  For each target:
  1. Map the application structure
  2. Identify input vectors
  3. Test for common vulnerabilities
  4. Validate findings with proof-of-concept
  5. Document remediation guidance
Recommended configuration:
  • Use Agents: Enabled
  • Primary Model: Claude Sonnet 4 or o3-mini
  • Tools Focus: sqlmap, commix, nikto, burpsuite

Network Infrastructure Specialist

name: "Network Infrastructure Specialist"
description: "Focused on network-level penetration testing"
use_agents: true

prompt_template: |
  You specialize in network infrastructure security assessment:
  - Network mapping and reconnaissance
  - Port and service enumeration
  - Vulnerability scanning
  - Exploit development for network services
  
  Methodology:
  1. Passive reconnaissance
  2. Active scanning with nmap
  3. Service enumeration
  4. Vulnerability identification
  5. Exploitation attempts
  6. Post-exploitation and pivoting
Recommended configuration:
  • Use Agents: Enabled
  • Primary Model: o3-mini or Gemini 2.5 Pro
  • Tools Focus: nmap, metasploit, masscan, enum4linux

API Security Specialist

name: "API Security Specialist"
description: "REST and GraphQL API security testing"
use_agents: false

prompt_template: |
  You are an API security expert focusing on:
  - REST and GraphQL endpoints
  - Authentication mechanisms (OAuth, JWT)
  - Authorization bypass
  - Input validation and injection
  - Rate limiting and abuse
  
  Testing approach:
  1. API documentation analysis
  2. Endpoint enumeration
  3. Authentication testing
  4. Authorization boundary testing
  5. Input fuzzing
  6. Business logic vulnerabilities
Recommended configuration:
  • Use Agents: Disabled (focused tasks)
  • Primary Model: GPT-4.1 or Claude Haiku 4.5
  • Tools Focus: curl, jwt_tool, graphql-playground

Testing Assistant Configuration

Verify your assistant works correctly:
1

Create test flow

Create a simple flow using your custom assistant:
Test the assistant by performing a basic port scan on 127.0.0.1
2

Monitor agent behavior

Observe:
  • Whether agents are delegated (if enabled)
  • Tool selection and execution
  • Response quality and accuracy
  • Token usage and cost
3

Adjust configuration

Based on results, tune:
  • Temperature for creativity vs consistency
  • Max tokens for response length
  • Reasoning effort for complexity
  • Agent delegation strategy

Best Practices

  • Use reasoning models (o3, Claude 3.7, QwQ) for complex strategic tasks
  • Use fast models (GPT-4.1-mini, Haiku, Flash) for simple operations
  • Balance cost vs performance based on testing requirements
  • Consider local models (Ollama) for privacy-sensitive engagements
  • Enable agents for multi-step, complex engagements
  • Disable agents for quick, focused vulnerability checks
  • Use specialized agents to leverage different model strengths
  • Monitor token usage to optimize delegation strategy
  • Low (0.2-0.3): Code generation, exploit development, precise tasks
  • Medium (0.5-0.7): General pentesting, balanced exploration
  • High (0.7-0.9): Research, creative bypass techniques, OSINT
  • Use provider configs to set per-agent token limits
  • Leverage summarization for long engagements
  • Enable Graphiti knowledge graph for semantic memory
  • Monitor memory usage in long-running tests

Next Steps

Advanced Techniques

Learn advanced pentesting workflows

Best Practices

Security and ethical guidelines

First Pentest

Put your assistant to work

Provider Configuration

Deep dive into provider settings

Build docs developers (and LLMs) love