Skip to main content

Overview

The exploitability validation pipeline ensures security findings are not false positives by systematically verifying they are real, reachable, and exploitable through a 6-stage process.

Why Validation?

Static analysis tools produce many findings, but not all are exploitable:
  • Hallucinated findings: File doesn’t exist, code doesn’t match scanner output
  • Unreachable code: Dead code, test-only functions
  • Protected paths: Effective sanitization, impossible preconditions
  • Binary constraints: Mitigations that block exploitation
Validation prevents wasted effort on false positives.

The 6-Stage Pipeline

┌─────────────────────────────────────────────────────┐
│  Stage 0: Inventory                                 │
│  Build ground truth checklist of all functions     │
└─────────────┬───────────────────────────────────────┘
              │ checklist.json

┌─────────────────────────────────────────────────────┐
│  Stage A: One-Shot Analysis                         │
│  Quick exploitability check + PoC attempt           │
└─────────────┬───────────────────────────────────────┘
              │ findings.json (status: pending/not_disproven)

┌─────────────────────────────────────────────────────┐
│  Stage B: Process                                   │
│  Systematic analysis with attack trees              │
└─────────────┬───────────────────────────────────────┘
              │ 5 working documents

┌─────────────────────────────────────────────────────┐
│  Stage C: Sanity Check                              │
│  Validate against actual source code                │
└─────────────┬───────────────────────────────────────┘
              │ sanity_check added to findings

┌─────────────────────────────────────────────────────┐
│  Stage D: Ruling                                    │
│  Filter based on practical exploitation criteria   │
└─────────────┬───────────────────────────────────────┘
              │ ruling & confirmed findings

         ┌────┴────┐
         │         │
         ▼         ▼
    Memory      Web/Injection
    Corruption     (Done)


┌─────────────────────────────────────────────────────┐
│  Stage E: Feasibility                               │
│  Binary constraint analysis for memory corruption   │
└─────────────┬───────────────────────────────────────┘
              │ final_status & feasibility

          validation-report.md

Stage 0: Inventory

Purpose: Build a complete checklist of all code to be analyzed.

Output

checklist.json - Complete function inventory:
{
  "generated_at": "2026-03-04T12:00:00Z",
  "target_path": "/path/to/code",
  "total_files": 42,
  "total_functions": 256,
  "files": [
    {
      "path": "src/parser.c",
      "functions": [
        {
          "name": "parse_header",
          "line_start": 120,
          "line_end": 145,
          "checked": false
        }
      ]
    }
  ]
}

Execution

from packages.exploitability_validation.checklist_builder import build_checklist

checklist = build_checklist(
    target_path="/path/to/code",
    workdir=".out/validation/",
    exclude_patterns=["*_test.*", "test_*"]
)

Stage A: One-Shot Analysis

Purpose: Quick exploitability assessment with PoC attempts.

Gates Applied

  • GATE-1 [ASSUME-EXPLOIT]: Assume findings are exploitable until proven otherwise
  • GATE-4 [NO-HEDGING]: No “maybe” or “could be” - verify all claims
  • GATE-6 [PROOF]: Provide concrete proof and vulnerable code

Output

findings.json - Initial exploitability assessment:
{
  "stage": "A",
  "timestamp": "2026-03-04T12:30:00Z",
  "findings": [
    {
      "id": "FINDING-0001",
      "file": "src/parser.c",
      "line": 134,
      "function": "parse_header",
      "vuln_type": "buffer_overflow",
      "status": "not_disproven",
      "message": "Unbounded strcpy into fixed buffer",
      "proof": "strcpy(buf, header);",
      "poc_attempted": true,
      "poc_result": "crash with SIGSEGV"
    }
  ]
}

Status Values

  • poc_success - PoC successfully demonstrated vulnerability
  • not_disproven - Cannot rule out, needs deeper analysis (Stage B)
  • disproven - Proven safe, no further analysis needed

Stage B: Process

Purpose: Systematic analysis for “not_disproven” findings using attack trees and knowledge graphs.

Gates Applied

ALL gates (1-6):
  • GATE-1: Assume exploitable
  • GATE-2: Strictly follow instructions
  • GATE-3: Update checklist, collect evidence
  • GATE-4: No hedging
  • GATE-5: Full code coverage
  • GATE-6: Provide proof

Working Documents

Stage B creates 5 specialized documents:

1. attack-tree.json

Knowledge graph of attack paths:
{
  "root": "Exploit buffer overflow in parse_header",
  "updated_at": "2026-03-04T13:00:00Z",
  "nodes": [
    {
      "id": "node-001",
      "type": "goal",
      "description": "Control instruction pointer",
      "children": ["node-002", "node-003"],
      "status": "testing"
    },
    {
      "id": "node-002",
      "type": "method",
      "description": "Overwrite return address on stack",
      "prerequisites": ["Stack overflow possible", "No stack canary"],
      "status": "confirmed"
    }
  ]
}

2. hypotheses.json

Testable predictions:
[
  {
    "id": "hyp-001",
    "hypothesis": "Input length controls overflow distance",
    "status": "confirmed",
    "evidence": [
      "Input of 100 bytes overwrites RBP",
      "Input of 104 bytes overwrites return address"
    ],
    "tested_at": "2026-03-04T13:15:00Z"
  },
  {
    "id": "hyp-002",
    "hypothesis": "Stack canary blocks exploitation",
    "status": "disproven",
    "evidence": ["Binary compiled without -fstack-protector"],
    "tested_at": "2026-03-04T13:20:00Z"
  }
]

3. disproven.json

Failed approaches:
[
  {
    "approach": "ROP chain via libc gadgets",
    "why_failed": "ASLR randomizes libc base, no info leak available",
    "attempted_at": "2026-03-04T13:30:00Z",
    "learnings": "Need info leak primitive before ROP"
  }
]

4. attack-paths.json

Attempted exploitation paths with PROXIMITY scoring:
[
  {
    "path_id": "path-001",
    "description": "Direct return address overwrite",
    "steps": [
      "1. Send 104-byte input",
      "2. Overwrite return address with shellcode location",
      "3. Return from function to shellcode"
    ],
    "proximity": 8,
    "blockers": ["DEP prevents shellcode execution"],
    "status": "blocked"
  },
  {
    "path_id": "path-002",
    "description": "ROP chain to mprotect()",
    "steps": [
      "1. Leak stack address",
      "2. Build ROP chain calling mprotect()",
      "3. Make stack executable",
      "4. Jump to shellcode on stack"
    ],
    "proximity": 5,
    "blockers": ["No info leak primitive found"],
    "status": "investigating"
  }
]
PROXIMITY Scale:
  • 10 - Working exploit
  • 8-9 - Very close, minor obstacles
  • 6-7 - Feasible path, some blockers
  • 4-5 - Significant obstacles
  • 1-3 - Far from exploitation
  • 0 - Not viable

5. attack-surface.json

Sources, sinks, and trust boundaries:
{
  "sources": [
    {
      "type": "user_input",
      "location": "src/parser.c:100",
      "function": "read_header",
      "description": "HTTP header from socket",
      "controllable": true
    }
  ],
  "sinks": [
    {
      "type": "memory_operation",
      "location": "src/parser.c:134",
      "function": "parse_header",
      "operation": "strcpy",
      "dangerous": true
    }
  ],
  "trust_boundaries": [
    {
      "location": "src/parser.c:105",
      "type": "validation",
      "description": "Header length check",
      "effective": false,
      "reason": "Check uses signed comparison, negative values bypass"
    }
  ]
}

Stage C: Sanity Check

Purpose: Verify findings against actual source code.

Gates Applied

  • GATE-3 [CHECKLIST]: Update checklist with verification
  • GATE-5 [FULL-COVERAGE]: Check all code, no sampling
  • GATE-6 [PROOF]: Show actual code verbatim

Verification Checks

  1. File exists at stated path
  2. Code matches VERBATIM at stated line (not paraphrased)
  3. Source→sink flow is real (not hypothetical)
  4. Code is reachable (function is actually called)

Output

findings.json with sanity_check field added:
{
  "id": "FINDING-0001",
  "file": "src/parser.c",
  "line": 134,
  "sanity_check": {
    "passed": true,
    "file_exists": true,
    "code_matches": true,
    "code_verbatim": "    strcpy(buf, header);",
    "flow_real": true,
    "reachable": true,
    "verified_at": "2026-03-04T14:00:00Z"
  }
}

Stage D: Ruling

Purpose: Make final exploitability determination based on all evidence.

Gates Applied

  • GATE-3 [CHECKLIST]: Document ruling decisions
  • GATE-5 [FULL-COVERAGE]: Rule on all findings
  • GATE-6 [PROOF]: Justify ruling with evidence

Ruling Criteria

Findings are ruled_out if:
  • Failed sanity check
  • Requires impossible preconditions
  • Protected by effective mitigations
  • Attack paths have PROXIMITY ≤ 2
Findings are confirmed if:
  • Passed sanity check
  • Realistic exploitation path exists
  • No effective protections
  • Attack paths have PROXIMITY ≥ 6

Output

findings.json with ruling field:
{
  "id": "FINDING-0001",
  "ruling": {
    "status": "Confirmed",
    "reason": "Passed sanity check, direct exploitation path with proximity 8",
    "attack_path": "path-001",
    "prerequisites": [],
    "ruled_at": "2026-03-04T14:30:00Z"
  }
}

Status Values

  • Confirmed - Exploitable, proceed to Stage E
  • Ruled Out - Not exploitable, stop here

Stage E: Feasibility

Purpose: Binary constraint analysis for memory corruption vulnerabilities.
Scope: Stage E only applies to memory corruption types (buffer overflow, format string, UAF, etc.). Web/injection vulnerabilities stop at Stage D.

Memory Corruption Types

Stage E applies to:
  • buffer_overflow
  • heap_overflow
  • stack_overflow
  • format_string
  • use_after_free
  • double_free
  • integer_overflow
  • out_of_bounds_read
  • out_of_bounds_write

Binary Analysis

Integrates with packages/exploit_feasibility for:
  1. Protection detection: ASLR, DEP, RELRO, stack canaries
  2. Constraint analysis: Bad bytes, null terminators
  3. Gadget availability: ROP gadgets, syscall availability
  4. Verdict: Likely / Difficult / Unlikely

Execution

from packages.exploit_feasibility import analyze_binary

result = analyze_binary(
    binary_path="/path/to/binary",
    vuln_type="buffer_overflow"
)

print(f"Verdict: {result['verdict']}")
print(f"Blockers: {result['blockers']}")
print(f"Suggestions: {result['suggestions']}")

Output

findings.json with feasibility and final_status:
{
  "id": "FINDING-0001",
  "feasibility": {
    "status": "analyzed",
    "binary_path": "/path/to/binary",
    "verdict": "Difficult",
    "chain_breaks": [
      "ASLR randomizes code base",
      "DEP prevents shellcode execution"
    ],
    "what_would_help": [
      "Info leak to defeat ASLR",
      "ROP chain for code reuse"
    ]
  },
  "final_status": "Confirmed (constrained)"
}

Final Status Mapping

Ruling StatusFeasibility VerdictFinal Status
ConfirmedLikelyExploitable
ConfirmedDifficultConfirmed (constrained)
ConfirmedUnlikelyConfirmed (blocked)
ConfirmedN/A (web vuln)Confirmed
Ruled Out-Ruled Out

CLI Usage

Full Pipeline

Run complete validation from scratch:
python3 -m packages.exploitability_validation \
  --target /path/to/code \
  --vuln-type buffer_overflow

With Pre-existing Findings

Validate findings from scanner output (skips Stage 0 and A):
python3 -m packages.exploitability_validation \
  --target /path/to/code \
  --findings scan_results.sarif

With Binary for Stage E

python3 -m packages.exploitability_validation \
  --target /path/to/code \
  --findings findings.json \
  --binary /path/to/compiled/binary

Skip Stage E

python3 -m packages.exploitability_validation \
  --target /path/to/code \
  --skip-feasibility

Custom Working Directory

python3 -m packages.exploitability_validation \
  --target /path/to/code \
  --workdir /custom/output/path

Python API

Orchestrator

from packages.exploitability_validation import ValidationOrchestrator, PipelineConfig

config = PipelineConfig(
    target_path="/path/to/code",
    workdir=".out/validation-20260304/",
    vuln_type="command_injection",
    binary_path=None,
    findings_file=None,
    skip_feasibility=False
)

orchestrator = ValidationOrchestrator(config)
result = orchestrator.run()

print(f"Success: {result.state.completed_at}")
for stage, stage_result in result.state.stage_results.items():
    print(f"{stage.name}: {stage_result.status}")

Convenience Function

from packages.exploitability_validation import run_validation

result = run_validation(
    target_path="/path/to/code",
    vuln_type="sql_injection",
    findings_file="scanner_output.sarif"
)

SARIF Input Support

The validation pipeline automatically converts SARIF format:
# Supported: SARIF 2.0 and 2.1.0
# From tools: Semgrep, CodeQL, others

config = PipelineConfig(
    target_path="/path/to/code",
    findings_file="semgrep_results.sarif"  # Auto-detected format
)

SARIF Conversion

  • Rule ID normalization: engine.semgrep.rules.crypto.weak-hashweak_hash
  • CWE mapping: CWE-89sql_injection
  • Deduplication: By file:line:vuln_type
  • Logical locations: Extracts function names
  • Severity mapping: SARIF levels → internal severity

Validation Report

Final output: validation-report.md
# Exploitability Validation Report

## Summary
- Target: /path/to/code
- Vulnerability Type: buffer_overflow
- Started: 2026-03-04 12:00:00
- Completed: 2026-03-04 14:45:00

## Stage Results
- Stage 0 (Inventory): [OK] (12.3s)
- Stage A (One-Shot): [OK] (45.7s)
- Stage B (Process): [OK] (123.4s)
- Stage C (Sanity): [OK] (23.1s)
- Stage D (Ruling): [OK] (8.9s)
- Stage E (Feasibility): [OK] (15.2s)

## Findings Summary
- Total: 15
- Exploitable: 2
- Confirmed (constrained): 3
- Confirmed (blocked): 1
- Ruled Out: 9

## Confirmed Findings

### FINDING-0001: buffer_overflow in src/parser.c:134
- Function: parse_header
- Final Status: Exploitable
- Feasibility: Likely
- Chain Breaks: None

### FINDING-0003: format_string in src/logger.c:89
- Function: log_message
- Final Status: Confirmed (constrained)
- Feasibility: Difficult
- Chain Breaks: RELRO blocks GOT overwrite, PIE randomizes addresses

Output Style Guide

Per RAPTOR’s style conventions:

Human-Readable Status

  • Exploitable (not EXPLOITABLE)
  • Confirmed (not CONFIRMED)
  • Ruled Out (not RULED_OUT)
  • Proven / Disproven (not PROVEN / DISPROVEN)

No Colored Indicators

  • ❌ Don’t use: 🔴/🟢 (perspective-dependent)
  • ✅ Use: Plain text or ### Exploitable (7 findings)
  • ✅ Other emojis OK: ⚠️, ✓, etc.

Best Practices

Start with SARIF input: Feed scanner output directly to validation to avoid manual finding transcription. The pipeline auto-converts and deduplicates.
Stage B is intensive: For large codebases with many “not_disproven” findings, Stage B can take hours. Consider filtering to high-severity findings first.
Stage E requires binary: If no compiled binary is available, Stage E is skipped. Memory corruption findings will be marked Confirmed without feasibility analysis.

Troubleshooting

Stage A produces all “not_disproven”

This is normal for complex vulnerabilities. Stage B will analyze them systematically.

Stage C sanity checks fail

Common causes:
  • Scanner output has stale file paths
  • Code changed since scanning
  • Scanner hallucinated the finding
Fix: Re-run scanner on current codebase.

Stage E skipped unexpectedly

Check:
  • Binary path is correct: --binary /path/to/binary
  • Binary is executable: chmod +x /path/to/binary
  • Vulnerability type is memory corruption

Integration Examples

From Semgrep

# 1. Run Semgrep
python3 packages/static-analysis/scanner.py \
  --repo /path/to/code \
  --policy_groups all

# 2. Validate findings
python3 -m packages.exploitability_validation \
  --target /path/to/code \
  --findings out/scan_*/combined.sarif

From CodeQL

# 1. Run CodeQL
python3 raptor_codeql.py \
  --repo /path/to/code \
  --scan-only

# 2. Validate findings
python3 -m packages.exploitability_validation \
  --target /path/to/code \
  --findings out/codeql_*/java_results.sarif \
  --binary /path/to/binary.jar

From Autonomous Mode

Validation runs automatically in /agentic:
/agentic /path/to/code
# Automatically runs:
# 1. Static analysis (Semgrep/CodeQL)
# 2. Exploitability validation (this pipeline)
# 3. LLM analysis
# 4. Exploit generation

See Also

Build docs developers (and LLMs) love