Overview
The Binary Analysis package provides comprehensive crash analysis capabilities for binary executables. It extracts exploitability information using debuggers (GDB/LLDB), disassembly, and LLM-powered analysis.
Purpose
Analyze crashes and binaries with:
- Crash analysis: Extract stack traces, registers, crash instructions
- Debugger integration: GDB (Linux/Windows) and LLDB (macOS)
- Disassembly: Function-level code analysis
- Exploitability assessment: Automated triage
- Symbol table extraction: Address-to-function mapping
Architecture
packages/binary_analysis/
├── crash_analyser.py # Main crash analysis engine
└── debugger.py # GDB/LLDB integration
Quick Start
Basic Crash Analysis
from pathlib import Path
from packages.binary_analysis import CrashAnalyser
# Initialize analyzer
analyser = CrashAnalyser(
binary_path=Path("/path/to/binary")
)
# Analyze a crash
context = analyser.analyze_crash(
input_file=Path("crashes/crash_001"),
signal="SIGSEGV"
)
print(f"Crash Type: {context.crash_type}")
print(f"Exploitability: {context.exploitability}")
print(f"Stack Trace:\n{context.stack_trace}")
print(f"Registers: {context.registers}")
With LLM Analysis
from packages.binary_analysis import CrashAnalyser
from packages.llm_analysis import AutonomousSecurityAgentV2
# 1. Extract crash context
analyser = CrashAnalyser(Path("target_binary"))
context = analyser.analyze_crash(
input_file=Path("crash_input"),
signal="SIGSEGV"
)
# 2. LLM analysis for deeper understanding
agent = AutonomousSecurityAgentV2(
repo_path=Path("."),
out_dir=Path("out/analysis")
)
exploit = agent.generate_exploit_for_crash(context)
print(f"Generated exploit:\n{exploit}")
Core Classes
CrashAnalyser
Main crash analysis engine.
class CrashAnalyser:
def __init__(self, binary_path: Path)
def analyze_crash(
self,
input_file: Path,
signal: str,
timeout: int = 30
) -> CrashContext
def get_stack_trace(
self,
input_file: Path
) -> str
def get_registers(
self,
input_file: Path
) -> Dict[str, str]
def get_crash_instruction(
self,
address: str
) -> str
def estimate_exploitability(
self,
context: CrashContext
) -> str
Path to target binary executable
CrashContext
Complete context for a crash.
@dataclass
class CrashContext:
crash_id: str
binary_path: Path
input_file: Path
signal: str
# From debugger
stack_trace: str
registers: Dict[str, str]
crash_instruction: str
crash_address: str
stack_hash: str
# From disassembly
disassembly: str
function_name: str
source_location: str
# Binary information
binary_info: Dict[str, str]
# Analysis results
exploitability: str
crash_type: str
cvss_estimate: float
analysis: Dict
# Generated artifacts
exploit_code: Optional[str]
Unique crash identifier (SHA256 hash)
Signal that caused crash (SIGSEGV, SIGABRT, SIGILL, etc.)
Full stack trace from debugger
CPU register values at crash (PC, SP, etc.)
Disassembly of crashing instruction
Exploitability estimate: “exploitable”, “likely”, “unlikely”, “not_exploitable”
Crash classification: “heap_overflow”, “stack_overflow”, “null_deref”, “use_after_free”, etc.
GDBDebugger
GDB integration for Linux/Windows binaries.
from packages.binary_analysis import GDBDebugger
debugger = GDBDebugger(binary_path=Path("target"))
# Get stack trace
stack_trace = debugger.get_stack_trace(
input_file=Path("crash_input")
)
# Get register dump
registers = debugger.get_registers(
input_file=Path("crash_input")
)
print(f"PC: {registers['pc']}")
print(f"SP: {registers['sp']}")
Analysis Workflow
Complete Crash Triage
from pathlib import Path
from packages.binary_analysis import CrashAnalyser
import json
analyser = CrashAnalyser(Path("target_binary"))
# Analyze multiple crashes
crash_dir = Path("crashes/")
results = []
for crash_file in crash_dir.glob("id:*"):
# Extract signal from filename: id:000000,sig:11
signal_part = crash_file.name.split(",sig:")[1] if ",sig:" in crash_file.name else "11"
signal_map = {"11": "SIGSEGV", "6": "SIGABRT", "4": "SIGILL"}
signal = signal_map.get(signal_part, "SIGSEGV")
# Analyze crash
context = analyser.analyze_crash(
input_file=crash_file,
signal=signal
)
results.append({
"crash_id": context.crash_id,
"input": str(crash_file),
"signal": signal,
"type": context.crash_type,
"exploitability": context.exploitability,
"cvss": context.cvss_estimate,
"stack_hash": context.stack_hash
})
# Sort by exploitability
results.sort(key=lambda x: {
"exploitable": 4,
"likely": 3,
"unlikely": 2,
"not_exploitable": 1
}.get(x["exploitability"], 0), reverse=True)
# Save report
with open("crash_triage.json", "w") as f:
json.dump(results, f, indent=2)
print(f"Analyzed {len(results)} crashes")
print(f"Exploitable: {sum(1 for r in results if r['exploitability'] == 'exploitable')}")
Debugger Support
Automatic Detection
The analyser automatically selects the appropriate debugger:
# On Linux: uses GDB
# On macOS: uses LLDB (for Mach-O binaries)
# On Windows: uses GDB (if available)
analyser = CrashAnalyser(binary_path)
# Debugger auto-detected based on platform and binary type
GDB Commands
# Internally uses GDB batch mode
commands = [
"run < input_file",
"bt full", # Full backtrace
"info registers", # Register dump
"x/i $pc", # Crash instruction
"info frame", # Frame info
]
LLDB Commands (macOS)
# For macOS binaries
commands = [
"process launch -i input_file",
"bt all", # Backtrace
"register read", # Registers
"disassemble -p", # Crash instruction
]
Crash Classification
Automatic Classification
The analyser classifies crashes based on:
- Signal type: SIGSEGV, SIGABRT, SIGILL, SIGFPE
- Crash address: NULL, heap, stack, code regions
- Instruction pattern: Write, read, execute, call
- Stack trace patterns: malloc/free, memcpy, strcpy
Crash Types
| Type | Signal | Exploitability | Description |
|---|
heap_overflow | SIGSEGV | High | Heap buffer overflow |
stack_overflow | SIGSEGV | High | Stack buffer overflow |
use_after_free | SIGSEGV | High | Use-after-free |
double_free | SIGABRT | Medium | Double free |
null_deref | SIGSEGV | Low | NULL pointer dereference |
integer_overflow | SIGFPE | Medium | Integer overflow |
format_string | SIGSEGV | High | Format string vulnerability |
Exploitability Assessment
Heuristic Analysis
context = analyser.analyze_crash(input_file, signal)
# Exploitability factors:
# - Crash type (heap overflow = high)
# - Write vs read (write = higher)
# - Control over crash address (PC control = exploitable)
# - ASAN detection (heap bugs = exploitable)
# - Stack trace patterns (strcpy/memcpy = likely)
if context.exploitability == "exploitable":
print("High confidence exploitable vulnerability")
elif context.exploitability == "likely":
print("Likely exploitable with some effort")
Exploitability Levels
| Level | Criteria | Action |
|---|
exploitable | PC control, heap overflow, format string | Immediate analysis |
likely | Stack overflow, UAF, double free | Deep analysis |
unlikely | NULL deref, assert failure | Low priority |
not_exploitable | Timeout, OOM, intentional abort | Skip |
context = analyser.analyze_crash(input_file, signal)
print("Binary Information:")
for key, value in context.binary_info.items():
print(f" {key}: {value}")
# Example output:
# arch: x86_64
# os: Linux
# stripped: False
# nx: True
# pie: True
# relro: Full
# canary: True
# asan: False
Check Security Features
from packages.exploit_feasibility import analyze_binary
# Deep binary analysis
result = analyze_binary("/path/to/binary")
print(f"PIE: {result['mitigations']['pie_enabled']}")
print(f"Canary: {result['mitigations']['stack_canary']}")
print(f"NX: {result['mitigations']['nx_enabled']}")
print(f"RELRO: {result['mitigations']['relro']}")
Symbol Table
Address-to-Function Mapping
# Symbol table automatically cached
analyser = CrashAnalyser(binary_path)
# Resolve crash address to function
function = analyser._symbol_cache.get("0x401234")
print(f"Crashed in function: {function}")
Source Location
# If debug symbols available
if context.source_location:
print(f"Crash location: {context.source_location}")
# Example: "src/parser.c:145"
Stack Hashing
Deduplication
# Crashes with same stack_hash are duplicates
crashes = []
for input_file in crash_files:
context = analyser.analyze_crash(input_file, "SIGSEGV")
crashes.append(context)
# Deduplicate by stack hash
unique_crashes = {}
for crash in crashes:
if crash.stack_hash not in unique_crashes:
unique_crashes[crash.stack_hash] = crash
print(f"Total crashes: {len(crashes)}")
print(f"Unique crashes: {len(unique_crashes)}")
Configuration
Timeouts
# Adjust crash analysis timeout
context = analyser.analyze_crash(
input_file=crash_file,
signal="SIGSEGV",
timeout=60 # 60 seconds (default: 30)
)
The analyser checks for these tools:
nm - Symbol table extraction
addr2line - Address to source resolution
objdump - Disassembly
readelf - ELF header analysis (Linux)
file - File type identification
strings - String extraction
Output Structure
out/crashes/
├── crash_abc123/
│ ├── input # Crashing input
│ ├── stack_trace.txt # Full stack trace
│ ├── registers.txt # Register dump
│ ├── disassembly.txt # Crash instruction disassembly
│ ├── context.json # CrashContext serialized
│ └── analysis.json # LLM analysis (if available)
└── crash_def456/
Integration
With Fuzzing
from packages.fuzzing import AFLRunner, CrashCollector
from packages.binary_analysis import CrashAnalyser
# 1. Fuzz
runner = AFLRunner(...)
runner.run_parallel_fuzzers(...)
# 2. Collect crashes
collector = CrashCollector(...)
crashes = collector.collect_crashes(...)
# 3. Analyze
analyser = CrashAnalyser(...)
for crash in crashes:
context = analyser.analyze_crash(crash.input_file, crash.signal)
With LLM Analysis
from packages.llm_analysis import AutonomousSecurityAgentV2
# Binary analysis -> LLM analysis
context = analyser.analyze_crash(...)
agent = AutonomousSecurityAgentV2(...)
exploit = agent.generate_exploit_for_crash(context)
patch = agent.generate_patch_for_crash(context)
Analysis Speed
- Per crash: 5-15 seconds
- With symbols: +2-5 seconds
- With ASAN: +3-8 seconds (richer info)
Deduplication
- Stack hashing: < 1 second per crash
- Typical deduplication: 100 crashes → 10-20 unique
Best Practices
- Compile with symbols (
-g) for better analysis
- Enable ASAN for heap bug detection
- Prioritize by exploitability (focus on “exploitable” first)
- Deduplicate early to avoid redundant analysis
- Use LLM analysis for complex crashes