Overview
The exploitability validation pipeline ensures security findings are not false positives by systematically verifying they are real, reachable, and exploitable through a 6-stage process.
Why Validation?
Static analysis tools produce many findings, but not all are exploitable:
- Hallucinated findings: File doesn’t exist, code doesn’t match scanner output
- Unreachable code: Dead code, test-only functions
- Protected paths: Effective sanitization, impossible preconditions
- Binary constraints: Mitigations that block exploitation
Validation prevents wasted effort on false positives.
The 6-Stage Pipeline
┌─────────────────────────────────────────────────────┐
│ Stage 0: Inventory │
│ Build ground truth checklist of all functions │
└─────────────┬───────────────────────────────────────┘
│ checklist.json
▼
┌─────────────────────────────────────────────────────┐
│ Stage A: One-Shot Analysis │
│ Quick exploitability check + PoC attempt │
└─────────────┬───────────────────────────────────────┘
│ findings.json (status: pending/not_disproven)
▼
┌─────────────────────────────────────────────────────┐
│ Stage B: Process │
│ Systematic analysis with attack trees │
└─────────────┬───────────────────────────────────────┘
│ 5 working documents
▼
┌─────────────────────────────────────────────────────┐
│ Stage C: Sanity Check │
│ Validate against actual source code │
└─────────────┬───────────────────────────────────────┘
│ sanity_check added to findings
▼
┌─────────────────────────────────────────────────────┐
│ Stage D: Ruling │
│ Filter based on practical exploitation criteria │
└─────────────┬───────────────────────────────────────┘
│ ruling & confirmed findings
▼
┌────┴────┐
│ │
▼ ▼
Memory Web/Injection
Corruption (Done)
│
▼
┌─────────────────────────────────────────────────────┐
│ Stage E: Feasibility │
│ Binary constraint analysis for memory corruption │
└─────────────┬───────────────────────────────────────┘
│ final_status & feasibility
▼
validation-report.md
Stage 0: Inventory
Purpose: Build a complete checklist of all code to be analyzed.
Output
checklist.json - Complete function inventory:
{
"generated_at": "2026-03-04T12:00:00Z",
"target_path": "/path/to/code",
"total_files": 42,
"total_functions": 256,
"files": [
{
"path": "src/parser.c",
"functions": [
{
"name": "parse_header",
"line_start": 120,
"line_end": 145,
"checked": false
}
]
}
]
}
Execution
from packages.exploitability_validation.checklist_builder import build_checklist
checklist = build_checklist(
target_path="/path/to/code",
workdir=".out/validation/",
exclude_patterns=["*_test.*", "test_*"]
)
Stage A: One-Shot Analysis
Purpose: Quick exploitability assessment with PoC attempts.
Gates Applied
- GATE-1 [ASSUME-EXPLOIT]: Assume findings are exploitable until proven otherwise
- GATE-4 [NO-HEDGING]: No “maybe” or “could be” - verify all claims
- GATE-6 [PROOF]: Provide concrete proof and vulnerable code
Output
findings.json - Initial exploitability assessment:
{
"stage": "A",
"timestamp": "2026-03-04T12:30:00Z",
"findings": [
{
"id": "FINDING-0001",
"file": "src/parser.c",
"line": 134,
"function": "parse_header",
"vuln_type": "buffer_overflow",
"status": "not_disproven",
"message": "Unbounded strcpy into fixed buffer",
"proof": "strcpy(buf, header);",
"poc_attempted": true,
"poc_result": "crash with SIGSEGV"
}
]
}
Status Values
poc_success - PoC successfully demonstrated vulnerability
not_disproven - Cannot rule out, needs deeper analysis (Stage B)
disproven - Proven safe, no further analysis needed
Stage B: Process
Purpose: Systematic analysis for “not_disproven” findings using attack trees and knowledge graphs.
Gates Applied
ALL gates (1-6):
- GATE-1: Assume exploitable
- GATE-2: Strictly follow instructions
- GATE-3: Update checklist, collect evidence
- GATE-4: No hedging
- GATE-5: Full code coverage
- GATE-6: Provide proof
Working Documents
Stage B creates 5 specialized documents:
1. attack-tree.json
Knowledge graph of attack paths:
{
"root": "Exploit buffer overflow in parse_header",
"updated_at": "2026-03-04T13:00:00Z",
"nodes": [
{
"id": "node-001",
"type": "goal",
"description": "Control instruction pointer",
"children": ["node-002", "node-003"],
"status": "testing"
},
{
"id": "node-002",
"type": "method",
"description": "Overwrite return address on stack",
"prerequisites": ["Stack overflow possible", "No stack canary"],
"status": "confirmed"
}
]
}
2. hypotheses.json
Testable predictions:
[
{
"id": "hyp-001",
"hypothesis": "Input length controls overflow distance",
"status": "confirmed",
"evidence": [
"Input of 100 bytes overwrites RBP",
"Input of 104 bytes overwrites return address"
],
"tested_at": "2026-03-04T13:15:00Z"
},
{
"id": "hyp-002",
"hypothesis": "Stack canary blocks exploitation",
"status": "disproven",
"evidence": ["Binary compiled without -fstack-protector"],
"tested_at": "2026-03-04T13:20:00Z"
}
]
3. disproven.json
Failed approaches:
[
{
"approach": "ROP chain via libc gadgets",
"why_failed": "ASLR randomizes libc base, no info leak available",
"attempted_at": "2026-03-04T13:30:00Z",
"learnings": "Need info leak primitive before ROP"
}
]
4. attack-paths.json
Attempted exploitation paths with PROXIMITY scoring:
[
{
"path_id": "path-001",
"description": "Direct return address overwrite",
"steps": [
"1. Send 104-byte input",
"2. Overwrite return address with shellcode location",
"3. Return from function to shellcode"
],
"proximity": 8,
"blockers": ["DEP prevents shellcode execution"],
"status": "blocked"
},
{
"path_id": "path-002",
"description": "ROP chain to mprotect()",
"steps": [
"1. Leak stack address",
"2. Build ROP chain calling mprotect()",
"3. Make stack executable",
"4. Jump to shellcode on stack"
],
"proximity": 5,
"blockers": ["No info leak primitive found"],
"status": "investigating"
}
]
PROXIMITY Scale:
10 - Working exploit
8-9 - Very close, minor obstacles
6-7 - Feasible path, some blockers
4-5 - Significant obstacles
1-3 - Far from exploitation
0 - Not viable
5. attack-surface.json
Sources, sinks, and trust boundaries:
{
"sources": [
{
"type": "user_input",
"location": "src/parser.c:100",
"function": "read_header",
"description": "HTTP header from socket",
"controllable": true
}
],
"sinks": [
{
"type": "memory_operation",
"location": "src/parser.c:134",
"function": "parse_header",
"operation": "strcpy",
"dangerous": true
}
],
"trust_boundaries": [
{
"location": "src/parser.c:105",
"type": "validation",
"description": "Header length check",
"effective": false,
"reason": "Check uses signed comparison, negative values bypass"
}
]
}
Stage C: Sanity Check
Purpose: Verify findings against actual source code.
Gates Applied
- GATE-3 [CHECKLIST]: Update checklist with verification
- GATE-5 [FULL-COVERAGE]: Check all code, no sampling
- GATE-6 [PROOF]: Show actual code verbatim
Verification Checks
- File exists at stated path
- Code matches VERBATIM at stated line (not paraphrased)
- Source→sink flow is real (not hypothetical)
- Code is reachable (function is actually called)
Output
findings.json with sanity_check field added:
{
"id": "FINDING-0001",
"file": "src/parser.c",
"line": 134,
"sanity_check": {
"passed": true,
"file_exists": true,
"code_matches": true,
"code_verbatim": " strcpy(buf, header);",
"flow_real": true,
"reachable": true,
"verified_at": "2026-03-04T14:00:00Z"
}
}
Stage D: Ruling
Purpose: Make final exploitability determination based on all evidence.
Gates Applied
- GATE-3 [CHECKLIST]: Document ruling decisions
- GATE-5 [FULL-COVERAGE]: Rule on all findings
- GATE-6 [PROOF]: Justify ruling with evidence
Ruling Criteria
Findings are ruled_out if:
- Failed sanity check
- Requires impossible preconditions
- Protected by effective mitigations
- Attack paths have PROXIMITY ≤ 2
Findings are confirmed if:
- Passed sanity check
- Realistic exploitation path exists
- No effective protections
- Attack paths have PROXIMITY ≥ 6
Output
findings.json with ruling field:
{
"id": "FINDING-0001",
"ruling": {
"status": "Confirmed",
"reason": "Passed sanity check, direct exploitation path with proximity 8",
"attack_path": "path-001",
"prerequisites": [],
"ruled_at": "2026-03-04T14:30:00Z"
}
}
Status Values
Confirmed - Exploitable, proceed to Stage E
Ruled Out - Not exploitable, stop here
Stage E: Feasibility
Purpose: Binary constraint analysis for memory corruption vulnerabilities.
Scope: Stage E only applies to memory corruption types (buffer overflow, format string, UAF, etc.). Web/injection vulnerabilities stop at Stage D.
Memory Corruption Types
Stage E applies to:
buffer_overflow
heap_overflow
stack_overflow
format_string
use_after_free
double_free
integer_overflow
out_of_bounds_read
out_of_bounds_write
Binary Analysis
Integrates with packages/exploit_feasibility for:
- Protection detection: ASLR, DEP, RELRO, stack canaries
- Constraint analysis: Bad bytes, null terminators
- Gadget availability: ROP gadgets, syscall availability
- Verdict: Likely / Difficult / Unlikely
Execution
from packages.exploit_feasibility import analyze_binary
result = analyze_binary(
binary_path="/path/to/binary",
vuln_type="buffer_overflow"
)
print(f"Verdict: {result['verdict']}")
print(f"Blockers: {result['blockers']}")
print(f"Suggestions: {result['suggestions']}")
Output
findings.json with feasibility and final_status:
{
"id": "FINDING-0001",
"feasibility": {
"status": "analyzed",
"binary_path": "/path/to/binary",
"verdict": "Difficult",
"chain_breaks": [
"ASLR randomizes code base",
"DEP prevents shellcode execution"
],
"what_would_help": [
"Info leak to defeat ASLR",
"ROP chain for code reuse"
]
},
"final_status": "Confirmed (constrained)"
}
Final Status Mapping
| Ruling Status | Feasibility Verdict | Final Status |
|---|
| Confirmed | Likely | Exploitable |
| Confirmed | Difficult | Confirmed (constrained) |
| Confirmed | Unlikely | Confirmed (blocked) |
| Confirmed | N/A (web vuln) | Confirmed |
| Ruled Out | - | Ruled Out |
CLI Usage
Full Pipeline
Run complete validation from scratch:
python3 -m packages.exploitability_validation \
--target /path/to/code \
--vuln-type buffer_overflow
With Pre-existing Findings
Validate findings from scanner output (skips Stage 0 and A):
python3 -m packages.exploitability_validation \
--target /path/to/code \
--findings scan_results.sarif
With Binary for Stage E
python3 -m packages.exploitability_validation \
--target /path/to/code \
--findings findings.json \
--binary /path/to/compiled/binary
Skip Stage E
python3 -m packages.exploitability_validation \
--target /path/to/code \
--skip-feasibility
Custom Working Directory
python3 -m packages.exploitability_validation \
--target /path/to/code \
--workdir /custom/output/path
Python API
Orchestrator
from packages.exploitability_validation import ValidationOrchestrator, PipelineConfig
config = PipelineConfig(
target_path="/path/to/code",
workdir=".out/validation-20260304/",
vuln_type="command_injection",
binary_path=None,
findings_file=None,
skip_feasibility=False
)
orchestrator = ValidationOrchestrator(config)
result = orchestrator.run()
print(f"Success: {result.state.completed_at}")
for stage, stage_result in result.state.stage_results.items():
print(f"{stage.name}: {stage_result.status}")
Convenience Function
from packages.exploitability_validation import run_validation
result = run_validation(
target_path="/path/to/code",
vuln_type="sql_injection",
findings_file="scanner_output.sarif"
)
The validation pipeline automatically converts SARIF format:
# Supported: SARIF 2.0 and 2.1.0
# From tools: Semgrep, CodeQL, others
config = PipelineConfig(
target_path="/path/to/code",
findings_file="semgrep_results.sarif" # Auto-detected format
)
SARIF Conversion
- Rule ID normalization:
engine.semgrep.rules.crypto.weak-hash → weak_hash
- CWE mapping:
CWE-89 → sql_injection
- Deduplication: By file:line:vuln_type
- Logical locations: Extracts function names
- Severity mapping: SARIF levels → internal severity
Validation Report
Final output: validation-report.md
# Exploitability Validation Report
## Summary
- Target: /path/to/code
- Vulnerability Type: buffer_overflow
- Started: 2026-03-04 12:00:00
- Completed: 2026-03-04 14:45:00
## Stage Results
- Stage 0 (Inventory): [OK] (12.3s)
- Stage A (One-Shot): [OK] (45.7s)
- Stage B (Process): [OK] (123.4s)
- Stage C (Sanity): [OK] (23.1s)
- Stage D (Ruling): [OK] (8.9s)
- Stage E (Feasibility): [OK] (15.2s)
## Findings Summary
- Total: 15
- Exploitable: 2
- Confirmed (constrained): 3
- Confirmed (blocked): 1
- Ruled Out: 9
## Confirmed Findings
### FINDING-0001: buffer_overflow in src/parser.c:134
- Function: parse_header
- Final Status: Exploitable
- Feasibility: Likely
- Chain Breaks: None
### FINDING-0003: format_string in src/logger.c:89
- Function: log_message
- Final Status: Confirmed (constrained)
- Feasibility: Difficult
- Chain Breaks: RELRO blocks GOT overwrite, PIE randomizes addresses
Output Style Guide
Per RAPTOR’s style conventions:
Human-Readable Status
- ✅
Exploitable (not EXPLOITABLE)
- ✅
Confirmed (not CONFIRMED)
- ✅
Ruled Out (not RULED_OUT)
- ✅
Proven / Disproven (not PROVEN / DISPROVEN)
No Colored Indicators
- ❌ Don’t use: 🔴/🟢 (perspective-dependent)
- ✅ Use: Plain text or
### Exploitable (7 findings)
- ✅ Other emojis OK: ⚠️, ✓, etc.
Best Practices
Start with SARIF input: Feed scanner output directly to validation to avoid manual finding transcription. The pipeline auto-converts and deduplicates.
Stage B is intensive: For large codebases with many “not_disproven” findings, Stage B can take hours. Consider filtering to high-severity findings first.
Stage E requires binary: If no compiled binary is available, Stage E is skipped. Memory corruption findings will be marked Confirmed without feasibility analysis.
Troubleshooting
Stage A produces all “not_disproven”
This is normal for complex vulnerabilities. Stage B will analyze them systematically.
Stage C sanity checks fail
Common causes:
- Scanner output has stale file paths
- Code changed since scanning
- Scanner hallucinated the finding
Fix: Re-run scanner on current codebase.
Stage E skipped unexpectedly
Check:
- Binary path is correct:
--binary /path/to/binary
- Binary is executable:
chmod +x /path/to/binary
- Vulnerability type is memory corruption
Integration Examples
From Semgrep
# 1. Run Semgrep
python3 packages/static-analysis/scanner.py \
--repo /path/to/code \
--policy_groups all
# 2. Validate findings
python3 -m packages.exploitability_validation \
--target /path/to/code \
--findings out/scan_*/combined.sarif
From CodeQL
# 1. Run CodeQL
python3 raptor_codeql.py \
--repo /path/to/code \
--scan-only
# 2. Validate findings
python3 -m packages.exploitability_validation \
--target /path/to/code \
--findings out/codeql_*/java_results.sarif \
--binary /path/to/binary.jar
From Autonomous Mode
Validation runs automatically in /agentic:
/agentic /path/to/code
# Automatically runs:
# 1. Static analysis (Semgrep/CodeQL)
# 2. Exploitability validation (this pipeline)
# 3. LLM analysis
# 4. Exploit generation
See Also