Skip to main content
The Exploitability Validator agent orchestrates a multi-stage pipeline that validates vulnerability findings before exploit development, preventing wasted effort on false positives and theoretical vulnerabilities.

Purpose

Validate that findings:
  1. Actually exist (not hallucinated)
  2. Are reachable (not dead code)
  3. Have working exploitation paths (not just theoretical)

Invocation

/validate <target_path> [--vuln-type <type>] [--findings <findings.json>]
Parameters:
  • target_path: Directory or file to analyze
  • --vuln-type: Optional focus (e.g., command_injection, sql_injection, xss)
  • --findings: Optional pre-existing findings to validate (skips Stage 0/A)
Examples:
/validate /home/user/webapp --vuln-type command_injection
/validate /home/user/binary_app --vuln-type format_string
/validate /path/to/code --findings initial_scan.json

Pipeline Stages

1

Stage 0: Inventory

Build complete function inventory with checklist.json
2

Stage A: One-Shot

Quick verification - attempt PoC for each candidate
3

Stage B: Systematic Process

Build attack trees and test hypotheses for unproven findings
4

Stage C: Sanity Check

Validate findings against actual code to catch hallucinations
5

Stage D: Ruling

Filter out test code and unrealistic preconditions
6

Stage E: Feasibility

Run exploit feasibility analysis for memory corruption (binary analysis)
7

Final Report

Generate comprehensive validation report

Shared Context (MUST-GATEs)

Before executing ANY stage, load: .claude/skills/exploitability-validation/SKILL.md This contains:
  • [CONFIG]: Configuration settings
  • [EXEC]: Execution rules
  • [GATES]: MUST-GATEs 1-6 that apply to ALL stages
  • [REMIND]: Critical reminders

MUST-GATEs Overview

Treat all findings as exploitable until proven otherwise. Burden of proof is on disproving, not proving.
Verify all uncertain claims. No “likely”, “probably”, “appears to” without verification.
Update working documents after every action. Maintain audit trail.
Every claim needs evidence. No assumptions without verification.
Check ALL code per checklist.json. No random sampling or incomplete coverage.
Working PoC or concrete disproof required. No theoretical assessments.

Stage Details

Stage 0: Inventory

Load: .claude/skills/exploitability-validation/stage-0-inventory.md Execution:
  1. Enumerate all files in target_path
  2. Exclude test/mock files
  3. Extract functions per file
  4. Write checklist.json
Output: checklist.json with complete function inventory

Stage A: One-Shot

Load: .claude/skills/exploitability-validation/stage-a-oneshot.md Execution:
  1. Assess each function for vuln_type
  2. Attempt PoC for candidates
  3. Write findings.json
Routing:
  • All PoCs succeed → Skip to Stage C
  • Some “not_disproven” → Continue to Stage B
  • All disproven → Report “no exploitable findings” and exit

Stage B: Systematic Process

Load: .claude/skills/exploitability-validation/stage-b-process.md Execution:
  1. Build attack trees for “not_disproven” findings
  2. Form and test hypotheses
  3. Track PROXIMITY metrics
  4. Attempt multiple attack paths
  5. Update working documents
Output:
  • findings.json (updated)
  • attack-tree.json
  • hypotheses.json
  • disproven.json
  • attack-paths.json
  • attack-surface.json

Stage C: Sanity Check

Load: .claude/skills/exploitability-validation/stage-c-sanity.md Execution:
  1. Verify files exist
  2. Verify code matches verbatim
  3. Verify flow is real
  4. Verify code is reachable
Updates: findings.json with sanity_check results
Removes findings that fail sanity check (hallucinations) from active consideration.

Stage D: Ruling

Load: .claude/skills/exploitability-validation/stage-d-ruling.md Execution:
  1. Check for test/mock/example code
  2. Check for unrealistic preconditions
  3. Check for hedging language
Output: findings.json with CONFIRMED findings only

Stage E: Feasibility (Memory Corruption Only)

Load: .claude/skills/exploitability-validation/stage-e-feasibility.md Applies to:
  • buffer_overflow
  • heap_overflow
  • format_string
  • use_after_free
  • double_free
  • integer_overflow
  • out_of_bounds_read/write
Skip for:
  • command_injection
  • sql_injection
  • xss
  • path_traversal
  • ssrf
  • deserialization
Execution:
from packages.exploit_feasibility import (
    analyze_binary,
    format_analysis_summary,
    save_exploit_context
)

for finding in confirmed_findings:
    if finding.vuln_type in MEMORY_CORRUPTION_TYPES:
        result = analyze_binary(binary_path, vuln_type=finding.vuln_type)
        context_file = save_exploit_context(binary_path)

        finding.feasibility = {
            'verdict': result.verdict,  # Likely, Difficult, Unlikely
            'chain_breaks': result.chain_breaks,
            'what_would_help': result.what_would_help,
            'context_file': context_file
        }

        # Update final status
        if result.verdict == 'Likely':
            finding.final_status = 'EXPLOITABLE'
        elif result.verdict == 'Difficult':
            finding.final_status = 'CONFIRMED_CONSTRAINED'
        else:
            finding.final_status = 'CONFIRMED_BLOCKED'

Working Directory Structure

.out/exploitability-validation-20260304_140000/
├── checklist.json                    # Stage 0 output
├── findings.json                      # Updated through stages
├── attack-tree.json                   # Stage B
├── hypotheses.json                    # Stage B
├── disproven.json                     # Stage B
├── attack-paths.json                  # Stage B
├── attack-surface.json                # Stage B
├── exploit-context.json               # Stage E (if applicable)
└── validation-report.md               # Final report

Final Report Format

# Exploitability Validation Report

## Summary
- Target: <target_path>
- Vulnerability Type: <vuln_type>
- Timestamp: <timestamp>

## Results
- Total functions analyzed: N
- Initial candidates: N
- After Stage A (One-Shot): N confirmed, N not_disproven, N disproven
- After Stage B (Process): N confirmed, N disproven
- After Stage C (Sanity): N passed, N failed (hallucinations)
- After Stage D (Ruling): N confirmed, N ruled out
- After Stage E (Feasibility): N exploitable, N constrained, N blocked, N not applicable

## Confirmed Findings

### FIND-001: <vuln_type> in <file>:<line>
- Function: <function_name>
- Proof: <code snippet>
- PoC: <poc description>
- Final Status: <EXPLOITABLE|CONFIRMED_CONSTRAINED|CONFIRMED_BLOCKED|CONFIRMED>
- Feasibility: <verdict if memory corruption>
- Chain Breaks: <list if applicable>
- Recommendation: <next steps>

## Ruled Out Findings
<list with reasons>

## Coverage
- checklist.json compliance: X/Y functions checked

Example Executions

/validate /home/user/webapp --vuln-type command_injection

Phase 0: Created .out/exploitability-validation-20260122-143022/
Phase 1: Stage 0 complete - 15 files, 42 functions in checklist.json
Phase 2: Stage A complete - 3 candidates, 1 PoC success, 2 not_disproven
Phase 3: Stage B complete - 1 more confirmed, 1 disproven
Phase 4: Stage C complete - 2/2 passed sanity check
Phase 5: Stage D complete - 2/2 confirmed
Phase 6: Stage E skipped (command_injection is not memory corruption)
Phase 7: Report written to validation-report.md

Result: 2 CONFIRMED command injection vulnerabilities

Error Handling

  • File not found: Stop, report which file, ask user for correct path
  • Stage fails: Report which stage, what failed, offer to retry or skip
  • No findings: Report “no exploitable vulnerabilities found” (valid outcome)
  • Sanity check failures: Report as potential hallucinations, continue with valid findings

Integration with /agentic

The /agentic command now automatically runs exploitability validation (Phase 2) between scanning and analysis.Use --skip-validation to bypass.

OffSec Specialist

Offensive security operations and vulnerability discovery

Crash Analysis

Analyze crashes from fuzzing campaigns

Exploit Developer

Generate working exploit proof-of-concepts

Binary Exploitation Specialist

Binary exploit generation methodology

Build docs developers (and LLMs) love