Skip to main content

Overview

The TargetedPentestAgent is a pentest-focused specialization of the OffensiveSecurityAgent. It performs targeted security assessments against specific objectives, documents vulnerabilities with proof-of-concept exploits, and provides detailed remediation guidance. Unlike reconnaissance agents that perform broad discovery, the TargetedPentestAgent focuses deeply on testing specific endpoints or objectives that have already been identified.

Key Features

  • Objective-Driven Testing: Tests specific objectives (e.g., “Test for SQL injection on /api/users”)
  • Blackbox Methodology: No access to source code — tests purely through external interaction
  • Automatic PoC Generation: Creates proof-of-concept scripts for confirmed vulnerabilities
  • Findings Deduplication: Integrates with FindingsRegistry to prevent duplicate reporting
  • Session Management: Supports authenticated testing with session cookies/headers
  • Structured Results: Returns typed results with all findings and file paths

Constructor

new TargetedPentestAgent(opts: PentestAgentInput)
opts
PentestAgentInput
required
Configuration object for the pentest agent

PentestAgentInput

model
AIModel
required
AI model identifier (e.g., "claude-sonnet-4-20250514")
session
SessionInfo
required
Session providing paths for findings, POCs, logs, etc.
target
string
required
The URL / host to test
objectives
string[]
required
One or more testing objectives (e.g., “Test for SQL injection on /api/users”)
authConfig
AIAuthConfig
Optional per-provider API key overrides
onStepFinish
StreamTextOnStepFinishCallback<ToolSet>
Callback fired after each agent step
abortSignal
AbortSignal
AbortSignal to cancel the agent mid-run
sandbox
UnifiedSandbox
When set, tools execute inside this sandbox instead of locally
findingsRegistry
FindingsRegistry
Shared findings registry for cross-agent dedup
callbacks
ConsumeCallbacks
Callbacks for stream events
credentialManager
CredentialManager
In-memory credential store for secret-free agent prompts
stopWhen
StopCondition<ToolSet>
Override the default stop condition

Result Type

The consume() method returns a PentestResult:
findings
Finding[]
All findings discovered during the run
findingsPath
string
Absolute path to the session’s findings directory
pocsPath
string
Absolute path to the session’s POC scripts directory

Finding Object

title
string
Short descriptive title for the vulnerability
severity
'CRITICAL' | 'HIGH' | 'MEDIUM' | 'LOW'
Severity level of the vulnerability
description
string
Detailed description of the vulnerability
impact
string
Explanation of the potential impact if exploited
evidence
string
Evidence demonstrating the vulnerability (request/response, screenshots, etc.)
endpoint
string
The affected endpoint or URL
pocPath
string
Path to the proof-of-concept script
remediation
string
Step-by-step remediation guidance
references
string
External references (OWASP, CVEs, documentation)
toolCallDescription
string
Description of why this vulnerability was documented

Active Tools

The TargetedPentestAgent uses the following tools:
  • execute_command - Run exploit scripts and security testing tools
  • http_request - Send targeted HTTP requests to test endpoints
  • document_vulnerability - Document confirmed vulnerabilities
  • create_poc - Create proof-of-concept exploit scripts
  • response - Submit final testing summary (auto-injected)
  • browser_navigate - Navigate to pages for evidence collection
  • browser_snapshot - Capture DOM snapshots
  • browser_screenshot - Take screenshots for evidence
  • browser_click - Interact with page elements for interactive testing
  • browser_fill - Fill form fields for form-based attacks (XSS, injection)
  • email_list_inboxes - List available email inboxes (if configured)
  • email_list_messages - List messages (for testing email-based flows)
  • email_search_messages - Search for specific messages
  • email_get_message - Retrieve full message content

Usage Examples

Basic Vulnerability Testing

import { TargetedPentestAgent } from "@pensar/apex";
import { createSession } from "@pensar/apex/session";

const session = await createSession({
  name: "SQL Injection Test",
  targets: ["https://api.example.com"],
});

const agent = new TargetedPentestAgent({
  target: "https://api.example.com",
  objectives: [
    "Test for SQL injection on /api/users endpoint",
    "Test for authentication bypass",
  ],
  model: "claude-sonnet-4-20250514",
  session,
});

const { findings, findingsPath, pocsPath } = await agent.consume({
  onTextDelta: (delta) => {
    process.stdout.write(delta.text);
  },
  onToolCall: (delta) => {
    console.log(`→ ${delta.toolName}`);
  },
});

console.log(`\nFound ${findings.length} vulnerabilities`);
console.log(`Findings: ${findingsPath}`);
console.log(`POCs: ${pocsPath}`);

// Print findings summary
for (const finding of findings) {
  console.log(`\n[${finding.severity}] ${finding.title}`);
  console.log(`Endpoint: ${finding.endpoint}`);
  console.log(`PoC: ${finding.pocPath}`);
}

Authenticated Testing

import { TargetedPentestAgent } from "@pensar/apex";
import { createSession } from "@pensar/apex/session";
import { writeFileSync } from "fs";
import { join } from "path";

const session = await createSession({
  name: "Authenticated API Test",
  targets: ["https://api.example.com"],
});

// Create auth-data.json with authenticated session
const authDataPath = join(session.rootPath, "auth", "auth-data.json");
writeFileSync(authDataPath, JSON.stringify({
  authenticated: true,
  cookies: "session_id=abc123; token=xyz789",
  headers: {
    "Authorization": "Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
    "X-API-Key": "secret_key_123",
  },
  strategy: "bearer_token",
}));

const agent = new TargetedPentestAgent({
  target: "https://api.example.com",
  objectives: [
    "Test for IDOR on /api/users/{id} endpoint",
    "Test for privilege escalation",
  ],
  model: "claude-sonnet-4-20250514",
  session,
});

const { findings } = await agent.consume({
  onTextDelta: (delta) => process.stdout.write(delta.text),
});

With Findings Registry (Deduplication)

import { TargetedPentestAgent } from "@pensar/apex";
import { FindingsRegistry } from "@pensar/apex/findings";
import { createSession } from "@pensar/apex/session";

const session = await createSession({
  name: "Multi-Agent Test",
  targets: ["https://example.com"],
});

const registry = new FindingsRegistry();

// First agent
const agent1 = new TargetedPentestAgent({
  target: "https://example.com",
  objectives: ["Test /api/login for SQL injection"],
  model: "claude-sonnet-4-20250514",
  session,
  findingsRegistry: registry,
});

const result1 = await agent1.consume();
console.log(`Agent 1 found ${result1.findings.length} vulnerabilities`);

// Second agent with same registry - won't duplicate findings
const agent2 = new TargetedPentestAgent({
  target: "https://example.com",
  objectives: ["Test /api/users for injection vulnerabilities"],
  model: "claude-sonnet-4-20250514",
  session,
  findingsRegistry: registry,
});

const result2 = await agent2.consume();
console.log(`Agent 2 found ${result2.findings.length} new vulnerabilities`);
console.log(`Total unique findings: ${registry.size}`);

With Sandbox Execution

import { TargetedPentestAgent } from "@pensar/apex";
import { createSandbox } from "@pensar/apex/sandbox";
import { createSession } from "@pensar/apex/session";

const session = await createSession({
  name: "Sandboxed Test",
  targets: ["https://example.com"],
});

const sandbox = await createSandbox({
  type: "docker",
  image: "pentest-tools:latest",
});

const agent = new TargetedPentestAgent({
  target: "https://example.com",
  objectives: ["Test for RCE on /api/upload"],
  model: "claude-sonnet-4-20250514",
  session,
  sandbox, // All execute_command and create_poc calls run in sandbox
});

const { findings } = await agent.consume({
  onTextDelta: (delta) => process.stdout.write(delta.text),
});

await sandbox.cleanup();

Exfil Mode (CTF/Red Team)

import { TargetedPentestAgent } from "@pensar/apex";
import { createSession } from "@pensar/apex/session";

const session = await createSession({
  name: "CTF Challenge",
  targets: ["https://ctf.example.com"],
  config: {
    exfilMode: true, // Enables data extraction focus
    outcomeGuidance: "Find and extract the flag in format FLAG{...}",
  },
});

const agent = new TargetedPentestAgent({
  target: "https://ctf.example.com",
  objectives: [
    "Find SSRF vulnerability",
    "Pivot through SSRF to access internal services",
    "Extract the flag",
  ],
  model: "claude-sonnet-4-20250514",
  session,
});

const { findings } = await agent.consume({
  onTextDelta: (delta) => process.stdout.write(delta.text),
});

Custom Stop Condition

import { TargetedPentestAgent } from "@pensar/apex";
import { createSession } from "@pensar/apex/session";
import { hasToolCall, stepCountIs } from "ai";

const session = await createSession({
  name: "Limited Test",
  targets: ["https://example.com"],
});

const agent = new TargetedPentestAgent({
  target: "https://example.com",
  objectives: ["Quick XSS scan"],
  model: "claude-sonnet-4-20250514",
  session,
  stopWhen: stepCountIs(50), // Stop after 50 steps regardless
});

const { findings } = await agent.consume();

Testing Methodology

The TargetedPentestAgent follows a structured testing methodology:
1

PLAN

States objectives and outlines testing plan before executing any tools. Describes which attack techniques, payloads, and tools will be used.
2

VERIFY

Confirms the target endpoint exists and is reachable. Understands basic behavior (response format, parameters, auth requirements).
3

PREPARE

Researches applicable payloads and attack techniques. Crafts payloads tailored to the target’s technology and behavior.
4

TEST

Executes targeted attacks methodically, one payload/technique at a time. Observes responses carefully for vulnerability indicators.
5

EXPLOIT

When a vulnerability is confirmed, creates a proof-of-concept script that reliably demonstrates it.
6

DOCUMENT

Documents every confirmed finding with evidence, impact assessment, and remediation steps.
7

FINISH

After testing ALL objectives, calls the response tool to submit final summary.

Testing Modes

Standard blackbox penetration testing:
  • Test for common vulnerabilities
  • Document confirmed findings
  • Provide remediation guidance
  • No data extraction focus

Rate Limiting Handling

The agent automatically handles rate limiting:
  1. Detects HTTP 429 responses
  2. Uses exponential backoff:
    • First retry: 5 seconds
    • Second retry: 30 seconds
    • Third retry: 120 seconds
  3. After 3 attempts, notes rate limiting in summary and moves to next objective

Best Practices

Blackbox Only: The agent must NEVER read, view, or analyze source code. It performs blackbox testing through external interaction only.
PoC Required: Always create a working proof-of-concept script BEFORE calling document_vulnerability. The tool is only for confirmed, exploitable vulnerabilities.
Browser Tools: Use browser_screenshot to capture evidence of successful exploits (XSS alerts, error pages, authentication bypass, etc.)
No False Positives: Never document a vulnerability without confirming it. If unable to confirm, describe it in the final response summary instead.

Convenience Runner

import { runPentestAgent } from "@pensar/apex/agents/specialized/pentest";

const result = await runPentestAgent({
  target: "https://example.com",
  objectives: ["Test for XSS on /search"],
  model: "claude-sonnet-4-20250514",
  session,
});

// Automatically logs progress and results to console

Build docs developers (and LLMs) love