/validate

Overview

The /validate command validates that vulnerability findings are real, reachable, and exploitable before investing in exploit development. It runs a rigorous 6-stage validation pipeline to filter out false positives and assess true exploitability.

Validation prevents wasted effort on false positives. Use this before /exploit to ensure findings are genuine.

Syntax

/validate <target_path> [--vuln-type <type>] [--findings <file>] [--binary <path>] [--skip-feasibility]

Parameters

target_path

string

required

Directory or file to analyze

vuln-type

string

Focus on specific vulnerability type (optional)

findings

string

Pre-existing findings.json to validate (skips discovery)

binary

string

Path to compiled binary for Stage E feasibility analysis

skip-feasibility

boolean

Skip Stage E even for memory corruption vulnerabilities

Validation Stages

All stages are mandatory. Execute in sequence: 0 → A → B → C → D → E

Stage 0: Inventory (Python)

Build checklist of all code to analyze.

from packages.exploitability_validation import build_checklist
checklist = build_checklist(target_path, output_dir)

Output: checklist.json

Stage A: One-Shot Analysis (Claude)

Quick vulnerability identification:

Read source files
Look for injection, overflow, UAF, format string, deserialization
Note: file, line, function, vuln_type, proof (actual code)

Output: findings.json with status “pending” or “not_disproven”

Stage B: Process Analysis (Claude)

Stage B is critical. This is where superficial scanning becomes thorough validation.

Systematic analysis with attack trees:

Build attack surface: sources, sinks, trust boundaries → attack-surface.json
Build attack tree: knowledge graph of attack paths → attack-tree.json
Form hypotheses: testable predictions for each finding → hypotheses.json
Test hypotheses: gather evidence, verify predictions
Track failures: why approaches didn’t work → disproven.json
Track proximity: how close to exploitation (0-10 scale) → attack-paths.json

Stage B produces 5 working documents:

attack-surface.json  - Sources, sinks, trust boundaries
attack-tree.json     - Attack knowledge graph
hypotheses.json      - Testable predictions (status: testing/confirmed/disproven)
disproven.json       - Failed approaches and why
attack-paths.json    - Paths tried, PROXIMITY scores, blockers

Stage C: Sanity Check (Claude)

Verify against actual code:

Confirm file exists at stated path
Confirm vulnerable code exists at stated line (VERBATIM)
Confirm source→sink flow is real
Confirm code is reachable (called from main/handler)

Output: Update findings.json with sanity_check field

Stage D: Ruling (Claude)

Make final determinations:

Rule out test code, dead code, already-mitigated code
Check for preconditions that prevent exploitation
Apply hypothesis results from Stage B
Final status: Exploitable, Confirmed, or Ruled Out

Output: Update findings.json with ruling and final_status fields

Stage E: Feasibility (Python)

For memory corruption only:

from packages.exploit_feasibility import analyze_binary
result = analyze_binary(binary_path, vuln_type='buffer_overflow')

Output: exploit-context.json Final Status After Stage E:

Verdict	Final Status	Meaning
Likely	Exploitable	Clear path to code execution
Difficult	Confirmed (Constrained)	Primitives exist but hard to chain
Unlikely	Confirmed (Blocked)	No viable path with current mitigations
N/A	Confirmed	Web/injection vuln (Stage E skipped)

Execution Models

Non-Agentic Mode (Claude Code)

When user runs /validate <path>:

You are the LLM - perform the analysis yourself
Run Stage 0 via Python (inventory) → checklist.json
Stage A: Read files, identify vulnerabilities → findings.json
Stage B: Build attack trees, form & test hypotheses → 5 working docs
Stage C: Verify findings against actual code
Stage D: Make rulings based on Stage B evidence
Run Stage E via Python if binary provided

Agentic Mode (Python Orchestration)

When user runs python3 raptor.py agentic --repo <path>:

Semgrep/CodeQL scan first - produces SARIF files
SARIF conversion - deduplicates findings
If LLM API available - runs full validation pipeline via API calls
If no LLM API - deduplication only, skips validation theater
Stage E - runs if binary provided

Vulnerability Types

command_injection

string

OS command injection

sql_injection

string

SQL injection

xss

string

Cross-site scripting

path_traversal

string

Directory traversal

ssrf

string

Server-side request forgery

deserialization

string

Insecure deserialization

buffer_overflow

string

Buffer overflow (memory corruption)

format_string

string

Format string vulnerabilities

Examples

Validate Web Application

/validate ./webapp --vuln-type command_injection

Scans for command injection vulnerabilities.

Validate All Vulnerability Types

/validate ./src

Comprehensive validation of all vulnerability classes.

Validate Pre-Existing Findings

/validate ./src --findings scanner-results.json

Validates findings from external scanners.

Validate Memory Corruption with Binary

/validate ./vuln_app --vuln-type format_string --binary ./build/vuln

Includes Stage E feasibility analysis.

Skip Feasibility Analysis

/validate ./vuln_app --vuln-type buffer_overflow --skip-feasibility

Skips Stage E (not recommended for memory corruption).

Output Structure

.out/exploitability-validation-<timestamp>/
├── checklist.json           # All functions to check
├── findings.json            # Final validated findings
├── attack-tree.json         # Attack knowledge graph
├── hypotheses.json          # Tested hypotheses
├── disproven.json           # Failed approaches
├── attack-paths.json        # Paths tried + PROXIMITY
├── attack-surface.json      # Sources, sinks, boundaries
├── exploit-context.json     # Binary context (Stage E, if applicable)
└── validation-report.md     # Human-readable summary

Stage B: Why It Matters

If you’re tempted to skip Stage B because findings “obviously” look like false positives, don’t.

Without Stage B	With Stage B
Quick ruling based on gut feel	Evidence-backed ruling from tested hypotheses
”Looks like a false positive"	"Hypothesis H2 disproven: ws:// only in comment (evidence: line 463)“
No record of what was tried	`disproven.json` documents failed approaches
No proximity tracking	PROXIMITY scores show how close to exploitation

Create hypotheses even for “obvious” false positives:

Create the hypothesis (e.g., “H1: SSRF via urlretrieve”)
List testable predictions (e.g., “P1.1: Script runs at runtime”)
Gather evidence to disprove (e.g., “Script outputs .h file → build-time only”)
Record in disproven.json with lesson learned

This creates an audit trail and catches cases where “obvious” false positives are actually exploitable.

MUST-GATEs

This command enforces strict validation gates:

ASSUME-EXPLOIT: Investigate as if exploitable until proven otherwise
STRICT-SEQUENCE: Follow methodology, additional ideas presented separately
CHECKLIST: Track coverage compliance
NO-HEDGING: Verify all “if/maybe/uncertain” claims
FULL-COVERAGE: Check ALL code, no sampling
PROOF: Show vulnerable code for every finding

When to Use

After /scan or /agentic produces findings
Before investing time in /exploit development
When you suspect false positives from scanners
To validate third-party security reports

Workflow Integration

/scan -> /validate -> /exploit
   |         |           |
   v         v           v
 Finds    Confirms    Develops
 vulns    they're     working
          real        exploits

/scan

Generate findings to validate

/exploit

Generate exploits after validation

/agentic

Full workflow with automatic validation

/crash-analysis

Root-cause analysis for crashes

Notes

Stage E only runs for memory corruption vulnerabilities
Web vulnerabilities skip directly to final output after Stage D
Validation adds 5-10 minutes but prevents wasted effort
Use before exploit development to ensure findings are real
Produces audit trail of investigation process

Commands

Packages

Agents

Expert Personas

Overview

Syntax

Parameters

Validation Stages

Stage 0: Inventory (Python)

Stage A: One-Shot Analysis (Claude)

Stage B: Process Analysis (Claude)

Stage C: Sanity Check (Claude)

Stage D: Ruling (Claude)

Stage E: Feasibility (Python)

Execution Models

Non-Agentic Mode (Claude Code)

Agentic Mode (Python Orchestration)

Vulnerability Types

Examples

Validate Web Application

Validate All Vulnerability Types

Validate Pre-Existing Findings

Validate Memory Corruption with Binary

Skip Feasibility Analysis

Output Structure

Stage B: Why It Matters

MUST-GATEs

When to Use

Workflow Integration

/scan

/exploit

/agentic

/crash-analysis

Notes

Build docs developers (and LLMs) love

Commands

Packages

Agents

Expert Personas

​Overview

​Syntax

​Parameters

​Validation Stages

​Stage 0: Inventory (Python)

​Stage A: One-Shot Analysis (Claude)

​Stage B: Process Analysis (Claude)

​Stage C: Sanity Check (Claude)

​Stage D: Ruling (Claude)

​Stage E: Feasibility (Python)

​Execution Models

​Non-Agentic Mode (Claude Code)

​Agentic Mode (Python Orchestration)

​Vulnerability Types

​Examples

​Validate Web Application

​Validate All Vulnerability Types

​Validate Pre-Existing Findings

​Validate Memory Corruption with Binary

​Skip Feasibility Analysis

​Output Structure

​Stage B: Why It Matters

​MUST-GATEs

​When to Use

​Workflow Integration

​Related Commands

/scan

/exploit

/agentic

/crash-analysis

​Notes

Build docs developers (and LLMs) love

Overview

Syntax

Parameters

Validation Stages

Stage 0: Inventory (Python)

Stage A: One-Shot Analysis (Claude)

Stage B: Process Analysis (Claude)

Stage C: Sanity Check (Claude)

Stage D: Ruling (Claude)

Stage E: Feasibility (Python)

Execution Models

Non-Agentic Mode (Claude Code)

Agentic Mode (Python Orchestration)

Vulnerability Types

Examples

Validate Web Application

Validate All Vulnerability Types

Validate Pre-Existing Findings

Validate Memory Corruption with Binary

Skip Feasibility Analysis

Output Structure

Stage B: Why It Matters

MUST-GATEs

When to Use

Workflow Integration

Related Commands

Notes