Exploitability Validator Agent

The Exploitability Validator agent orchestrates a multi-stage pipeline that validates vulnerability findings before exploit development, preventing wasted effort on false positives and theoretical vulnerabilities.

Purpose

Validate that findings:

Actually exist (not hallucinated)
Are reachable (not dead code)
Have working exploitation paths (not just theoretical)

Invocation

/validate <target_path> [--vuln-type <type>] [--findings <findings.json>]

Parameters:

target_path: Directory or file to analyze
--vuln-type: Optional focus (e.g., command_injection, sql_injection, xss)
--findings: Optional pre-existing findings to validate (skips Stage 0/A)

Examples:

/validate /home/user/webapp --vuln-type command_injection
/validate /home/user/binary_app --vuln-type format_string
/validate /path/to/code --findings initial_scan.json

Pipeline Stages

Stage 0: Inventory

Build complete function inventory with checklist.json

Stage A: One-Shot

Quick verification - attempt PoC for each candidate

Stage B: Systematic Process

Build attack trees and test hypotheses for unproven findings

Stage C: Sanity Check

Validate findings against actual code to catch hallucinations

Stage D: Ruling

Filter out test code and unrealistic preconditions

Stage E: Feasibility

Run exploit feasibility analysis for memory corruption (binary analysis)

Final Report

Generate comprehensive validation report

Shared Context (MUST-GATEs)

Before executing ANY stage, load: .claude/skills/exploitability-validation/SKILL.md This contains:

[CONFIG]: Configuration settings
[EXEC]: Execution rules
[GATES]: MUST-GATEs 1-6 that apply to ALL stages
[REMIND]: Critical reminders

MUST-GATEs Overview

GATE-1: Assume Exploitable

Treat all findings as exploitable until proven otherwise. Burden of proof is on disproving, not proving.

GATE-2: No Hedging

Verify all uncertain claims. No “likely”, “probably”, “appears to” without verification.

GATE-3: Document Everything

Update working documents after every action. Maintain audit trail.

GATE-4: Verify Claims

Every claim needs evidence. No assumptions without verification.

GATE-5: No Sampling

Check ALL code per checklist.json. No random sampling or incomplete coverage.

GATE-6: Proof Required

Working PoC or concrete disproof required. No theoretical assessments.

Stage Details

Stage 0: Inventory

Load: .claude/skills/exploitability-validation/stage-0-inventory.md Execution:

Enumerate all files in target_path
Exclude test/mock files
Extract functions per file
Write checklist.json

Output: checklist.json with complete function inventory

Stage A: One-Shot

Load: .claude/skills/exploitability-validation/stage-a-oneshot.md Execution:

Assess each function for vuln_type
Attempt PoC for candidates
Write findings.json

Routing:

All PoCs succeed → Skip to Stage C
Some “not_disproven” → Continue to Stage B
All disproven → Report “no exploitable findings” and exit

Stage B: Systematic Process

Load: .claude/skills/exploitability-validation/stage-b-process.md Execution:

Build attack trees for “not_disproven” findings
Form and test hypotheses
Track PROXIMITY metrics
Attempt multiple attack paths
Update working documents

Output:

findings.json (updated)
attack-tree.json
hypotheses.json
disproven.json
attack-paths.json
attack-surface.json

Stage C: Sanity Check

Load: .claude/skills/exploitability-validation/stage-c-sanity.md Execution:

Verify files exist
Verify code matches verbatim
Verify flow is real
Verify code is reachable

Updates: findings.json with sanity_check results

Removes findings that fail sanity check (hallucinations) from active consideration.

Stage D: Ruling

Load: .claude/skills/exploitability-validation/stage-d-ruling.md Execution:

Check for test/mock/example code
Check for unrealistic preconditions
Check for hedging language

Output: findings.json with CONFIRMED findings only

Stage E: Feasibility (Memory Corruption Only)

Load: .claude/skills/exploitability-validation/stage-e-feasibility.md Applies to:

buffer_overflow
heap_overflow
format_string
use_after_free
double_free
integer_overflow
out_of_bounds_read/write

Skip for:

command_injection
sql_injection
xss
path_traversal
ssrf
deserialization

Execution:

from packages.exploit_feasibility import (
    analyze_binary,
    format_analysis_summary,
    save_exploit_context
)

for finding in confirmed_findings:
    if finding.vuln_type in MEMORY_CORRUPTION_TYPES:
        result = analyze_binary(binary_path, vuln_type=finding.vuln_type)
        context_file = save_exploit_context(binary_path)

        finding.feasibility = {
            'verdict': result.verdict,  # Likely, Difficult, Unlikely
            'chain_breaks': result.chain_breaks,
            'what_would_help': result.what_would_help,
            'context_file': context_file
        }

        # Update final status
        if result.verdict == 'Likely':
            finding.final_status = 'EXPLOITABLE'
        elif result.verdict == 'Difficult':
            finding.final_status = 'CONFIRMED_CONSTRAINED'
        else:
            finding.final_status = 'CONFIRMED_BLOCKED'

Working Directory Structure

.out/exploitability-validation-20260304_140000/
├── checklist.json                    # Stage 0 output
├── findings.json                      # Updated through stages
├── attack-tree.json                   # Stage B
├── hypotheses.json                    # Stage B
├── disproven.json                     # Stage B
├── attack-paths.json                  # Stage B
├── attack-surface.json                # Stage B
├── exploit-context.json               # Stage E (if applicable)
└── validation-report.md               # Final report

Final Report Format

# Exploitability Validation Report

## Summary
- Target: <target_path>
- Vulnerability Type: <vuln_type>
- Timestamp: <timestamp>

## Results
- Total functions analyzed: N
- Initial candidates: N
- After Stage A (One-Shot): N confirmed, N not_disproven, N disproven
- After Stage B (Process): N confirmed, N disproven
- After Stage C (Sanity): N passed, N failed (hallucinations)
- After Stage D (Ruling): N confirmed, N ruled out
- After Stage E (Feasibility): N exploitable, N constrained, N blocked, N not applicable

## Confirmed Findings

### FIND-001: <vuln_type> in <file>:<line>
- Function: <function_name>
- Proof: <code snippet>
- PoC: <poc description>
- Final Status: <EXPLOITABLE|CONFIRMED_CONSTRAINED|CONFIRMED_BLOCKED|CONFIRMED>
- Feasibility: <verdict if memory corruption>
- Chain Breaks: <list if applicable>
- Recommendation: <next steps>

## Ruled Out Findings
<list with reasons>

## Coverage
- checklist.json compliance: X/Y functions checked

Example Executions

Web Vulnerability
Memory Corruption

/validate /home/user/webapp --vuln-type command_injection

Phase 0: Created .out/exploitability-validation-20260122-143022/
Phase 1: Stage 0 complete - 15 files, 42 functions in checklist.json
Phase 2: Stage A complete - 3 candidates, 1 PoC success, 2 not_disproven
Phase 3: Stage B complete - 1 more confirmed, 1 disproven
Phase 4: Stage C complete - 2/2 passed sanity check
Phase 5: Stage D complete - 2/2 confirmed
Phase 6: Stage E skipped (command_injection is not memory corruption)
Phase 7: Report written to validation-report.md

Result: 2 CONFIRMED command injection vulnerabilities

/validate /home/user/binary_app --vuln-type format_string

Phase 0: Created .out/exploitability-validation-20260122-150000/
Phase 1: Stage 0 complete - 8 files, 23 functions in checklist.json
Phase 2: Stage A complete - 1 candidate, PoC shows %p leak works
Phase 3: Stage B skipped (PoC success in Stage A)
Phase 4: Stage C complete - 1/1 passed sanity check
Phase 5: Stage D complete - 1/1 confirmed
Phase 6: Stage E - Running exploit feasibility analysis...
         Binary: /home/user/binary_app/build/vuln
         Verdict: Difficult
         Chain breaks: Full RELRO (GOT blocked), glibc 2.38 (%n blocked)
         What would help: Older glibc, info leak for ASLR bypass
         Context saved: .out/.../exploit-context.json
Phase 7: Report written to validation-report.md

Result: 1 CONFIRMED_CONSTRAINED format string vulnerability
        Recommendation: Focus on info leak, or test in Docker with glibc 2.31

Error Handling

File not found: Stop, report which file, ask user for correct path
Stage fails: Report which stage, what failed, offer to retry or skip
No findings: Report “no exploitable vulnerabilities found” (valid outcome)
Sanity check failures: Report as potential hallucinations, continue with valid findings

Integration with /agentic

The /agentic command now automatically runs exploitability validation (Phase 2) between scanning and analysis.Use --skip-validation to bypass.

OffSec Specialist

Offensive security operations and vulnerability discovery

Crash Analysis

Analyze crashes from fuzzing campaigns

Exploit Developer

Generate working exploit proof-of-concepts

Binary Exploitation Specialist

Binary exploit generation methodology

Commands

Packages

Agents

Expert Personas

Exploitability Validator Agent

Purpose

Invocation

Pipeline Stages

Shared Context (MUST-GATEs)

MUST-GATEs Overview

Stage Details

Stage 0: Inventory

Stage A: One-Shot

Stage B: Systematic Process

Stage C: Sanity Check

Stage D: Ruling

Stage E: Feasibility (Memory Corruption Only)

Working Directory Structure

Final Report Format

Example Executions

Error Handling

Integration with /agentic

OffSec Specialist

Crash Analysis

Exploit Developer

Binary Exploitation Specialist

Build docs developers (and LLMs) love

Commands

Packages

Agents

Expert Personas

​Purpose

​Invocation

​Pipeline Stages

​Shared Context (MUST-GATEs)

​MUST-GATEs Overview

​Stage Details

​Stage 0: Inventory

​Stage A: One-Shot

​Stage B: Systematic Process

​Stage C: Sanity Check

​Stage D: Ruling

​Stage E: Feasibility (Memory Corruption Only)

​Working Directory Structure

​Final Report Format

​Example Executions

​Error Handling

​Integration with /agentic

​Related Agents

OffSec Specialist

Crash Analysis

​Related Personas

Exploit Developer

Binary Exploitation Specialist

Build docs developers (and LLMs) love

Purpose

Invocation

Pipeline Stages

Shared Context (MUST-GATEs)

MUST-GATEs Overview

Stage Details

Stage 0: Inventory

Stage A: One-Shot

Stage B: Systematic Process

Stage C: Sanity Check

Stage D: Ruling

Stage E: Feasibility (Memory Corruption Only)

Working Directory Structure

Final Report Format

Example Executions

Error Handling

Integration with /agentic

Related Agents

Related Personas