Skip to main content

Overview

The /validate command validates that vulnerability findings are real, reachable, and exploitable before investing in exploit development. It runs a rigorous 6-stage validation pipeline to filter out false positives and assess true exploitability.
Validation prevents wasted effort on false positives. Use this before /exploit to ensure findings are genuine.

Syntax

/validate <target_path> [--vuln-type <type>] [--findings <file>] [--binary <path>] [--skip-feasibility]

Parameters

target_path
string
required
Directory or file to analyze
vuln-type
string
Focus on specific vulnerability type (optional)
findings
string
Pre-existing findings.json to validate (skips discovery)
binary
string
Path to compiled binary for Stage E feasibility analysis
skip-feasibility
boolean
Skip Stage E even for memory corruption vulnerabilities

Validation Stages

All stages are mandatory. Execute in sequence: 0 → A → B → C → D → E

Stage 0: Inventory (Python)

Build checklist of all code to analyze.
from packages.exploitability_validation import build_checklist
checklist = build_checklist(target_path, output_dir)
Output: checklist.json

Stage A: One-Shot Analysis (Claude)

Quick vulnerability identification:
  • Read source files
  • Look for injection, overflow, UAF, format string, deserialization
  • Note: file, line, function, vuln_type, proof (actual code)
Output: findings.json with status “pending” or “not_disproven”

Stage B: Process Analysis (Claude)

Stage B is critical. This is where superficial scanning becomes thorough validation.
Systematic analysis with attack trees:
  1. Build attack surface: sources, sinks, trust boundaries → attack-surface.json
  2. Build attack tree: knowledge graph of attack paths → attack-tree.json
  3. Form hypotheses: testable predictions for each finding → hypotheses.json
  4. Test hypotheses: gather evidence, verify predictions
  5. Track failures: why approaches didn’t work → disproven.json
  6. Track proximity: how close to exploitation (0-10 scale) → attack-paths.json
Stage B produces 5 working documents:
attack-surface.json  - Sources, sinks, trust boundaries
attack-tree.json     - Attack knowledge graph
hypotheses.json      - Testable predictions (status: testing/confirmed/disproven)
disproven.json       - Failed approaches and why
attack-paths.json    - Paths tried, PROXIMITY scores, blockers

Stage C: Sanity Check (Claude)

Verify against actual code:
  • Confirm file exists at stated path
  • Confirm vulnerable code exists at stated line (VERBATIM)
  • Confirm source→sink flow is real
  • Confirm code is reachable (called from main/handler)
Output: Update findings.json with sanity_check field

Stage D: Ruling (Claude)

Make final determinations:
  • Rule out test code, dead code, already-mitigated code
  • Check for preconditions that prevent exploitation
  • Apply hypothesis results from Stage B
  • Final status: Exploitable, Confirmed, or Ruled Out
Output: Update findings.json with ruling and final_status fields

Stage E: Feasibility (Python)

For memory corruption only:
from packages.exploit_feasibility import analyze_binary
result = analyze_binary(binary_path, vuln_type='buffer_overflow')
Output: exploit-context.json Final Status After Stage E:
VerdictFinal StatusMeaning
LikelyExploitableClear path to code execution
DifficultConfirmed (Constrained)Primitives exist but hard to chain
UnlikelyConfirmed (Blocked)No viable path with current mitigations
N/AConfirmedWeb/injection vuln (Stage E skipped)

Execution Models

Non-Agentic Mode (Claude Code)

When user runs /validate <path>:
  1. You are the LLM - perform the analysis yourself
  2. Run Stage 0 via Python (inventory) → checklist.json
  3. Stage A: Read files, identify vulnerabilities → findings.json
  4. Stage B: Build attack trees, form & test hypotheses → 5 working docs
  5. Stage C: Verify findings against actual code
  6. Stage D: Make rulings based on Stage B evidence
  7. Run Stage E via Python if binary provided

Agentic Mode (Python Orchestration)

When user runs python3 raptor.py agentic --repo <path>:
  1. Semgrep/CodeQL scan first - produces SARIF files
  2. SARIF conversion - deduplicates findings
  3. If LLM API available - runs full validation pipeline via API calls
  4. If no LLM API - deduplication only, skips validation theater
  5. Stage E - runs if binary provided

Vulnerability Types

command_injection
string
OS command injection
sql_injection
string
SQL injection
xss
string
Cross-site scripting
path_traversal
string
Directory traversal
ssrf
string
Server-side request forgery
deserialization
string
Insecure deserialization
buffer_overflow
string
Buffer overflow (memory corruption)
format_string
string
Format string vulnerabilities

Examples

Validate Web Application

/validate ./webapp --vuln-type command_injection
Scans for command injection vulnerabilities.

Validate All Vulnerability Types

/validate ./src
Comprehensive validation of all vulnerability classes.

Validate Pre-Existing Findings

/validate ./src --findings scanner-results.json
Validates findings from external scanners.

Validate Memory Corruption with Binary

/validate ./vuln_app --vuln-type format_string --binary ./build/vuln
Includes Stage E feasibility analysis.

Skip Feasibility Analysis

/validate ./vuln_app --vuln-type buffer_overflow --skip-feasibility
Skips Stage E (not recommended for memory corruption).

Output Structure

.out/exploitability-validation-<timestamp>/
├── checklist.json           # All functions to check
├── findings.json            # Final validated findings
├── attack-tree.json         # Attack knowledge graph
├── hypotheses.json          # Tested hypotheses
├── disproven.json           # Failed approaches
├── attack-paths.json        # Paths tried + PROXIMITY
├── attack-surface.json      # Sources, sinks, boundaries
├── exploit-context.json     # Binary context (Stage E, if applicable)
└── validation-report.md     # Human-readable summary

Stage B: Why It Matters

If you’re tempted to skip Stage B because findings “obviously” look like false positives, don’t.
Without Stage BWith Stage B
Quick ruling based on gut feelEvidence-backed ruling from tested hypotheses
”Looks like a false positive""Hypothesis H2 disproven: ws:// only in comment (evidence: line 463)“
No record of what was trieddisproven.json documents failed approaches
No proximity trackingPROXIMITY scores show how close to exploitation
Create hypotheses even for “obvious” false positives:
  1. Create the hypothesis (e.g., “H1: SSRF via urlretrieve”)
  2. List testable predictions (e.g., “P1.1: Script runs at runtime”)
  3. Gather evidence to disprove (e.g., “Script outputs .h file → build-time only”)
  4. Record in disproven.json with lesson learned
This creates an audit trail and catches cases where “obvious” false positives are actually exploitable.

MUST-GATEs

This command enforces strict validation gates:
  1. ASSUME-EXPLOIT: Investigate as if exploitable until proven otherwise
  2. STRICT-SEQUENCE: Follow methodology, additional ideas presented separately
  3. CHECKLIST: Track coverage compliance
  4. NO-HEDGING: Verify all “if/maybe/uncertain” claims
  5. FULL-COVERAGE: Check ALL code, no sampling
  6. PROOF: Show vulnerable code for every finding

When to Use

  • After /scan or /agentic produces findings
  • Before investing time in /exploit development
  • When you suspect false positives from scanners
  • To validate third-party security reports

Workflow Integration

/scan -> /validate -> /exploit
   |         |           |
   v         v           v
 Finds    Confirms    Develops
 vulns    they're     working
          real        exploits

/scan

Generate findings to validate

/exploit

Generate exploits after validation

/agentic

Full workflow with automatic validation

/crash-analysis

Root-cause analysis for crashes

Notes

  • Stage E only runs for memory corruption vulnerabilities
  • Web vulnerabilities skip directly to final output after Stage D
  • Validation adds 5-10 minutes but prevents wasted effort
  • Use before exploit development to ensure findings are real
  • Produces audit trail of investigation process

Build docs developers (and LLMs) love