Skip to main content

Overview

The Binary Analysis package provides comprehensive crash analysis capabilities for binary executables. It extracts exploitability information using debuggers (GDB/LLDB), disassembly, and LLM-powered analysis.

Purpose

Analyze crashes and binaries with:
  • Crash analysis: Extract stack traces, registers, crash instructions
  • Debugger integration: GDB (Linux/Windows) and LLDB (macOS)
  • Disassembly: Function-level code analysis
  • Exploitability assessment: Automated triage
  • Symbol table extraction: Address-to-function mapping

Architecture

packages/binary_analysis/
├── crash_analyser.py      # Main crash analysis engine
└── debugger.py            # GDB/LLDB integration

Quick Start

Basic Crash Analysis

from pathlib import Path
from packages.binary_analysis import CrashAnalyser

# Initialize analyzer
analyser = CrashAnalyser(
    binary_path=Path("/path/to/binary")
)

# Analyze a crash
context = analyser.analyze_crash(
    input_file=Path("crashes/crash_001"),
    signal="SIGSEGV"
)

print(f"Crash Type: {context.crash_type}")
print(f"Exploitability: {context.exploitability}")
print(f"Stack Trace:\n{context.stack_trace}")
print(f"Registers: {context.registers}")

With LLM Analysis

from packages.binary_analysis import CrashAnalyser
from packages.llm_analysis import AutonomousSecurityAgentV2

# 1. Extract crash context
analyser = CrashAnalyser(Path("target_binary"))
context = analyser.analyze_crash(
    input_file=Path("crash_input"),
    signal="SIGSEGV"
)

# 2. LLM analysis for deeper understanding
agent = AutonomousSecurityAgentV2(
    repo_path=Path("."),
    out_dir=Path("out/analysis")
)

exploit = agent.generate_exploit_for_crash(context)
print(f"Generated exploit:\n{exploit}")

Core Classes

CrashAnalyser

Main crash analysis engine.
class CrashAnalyser:
    def __init__(self, binary_path: Path)
    
    def analyze_crash(
        self,
        input_file: Path,
        signal: str,
        timeout: int = 30
    ) -> CrashContext
    
    def get_stack_trace(
        self,
        input_file: Path
    ) -> str
    
    def get_registers(
        self,
        input_file: Path
    ) -> Dict[str, str]
    
    def get_crash_instruction(
        self,
        address: str
    ) -> str
    
    def estimate_exploitability(
        self,
        context: CrashContext
    ) -> str
binary_path
Path
required
Path to target binary executable

CrashContext

Complete context for a crash.
@dataclass
class CrashContext:
    crash_id: str
    binary_path: Path
    input_file: Path
    signal: str
    
    # From debugger
    stack_trace: str
    registers: Dict[str, str]
    crash_instruction: str
    crash_address: str
    stack_hash: str
    
    # From disassembly
    disassembly: str
    function_name: str
    source_location: str
    
    # Binary information
    binary_info: Dict[str, str]
    
    # Analysis results
    exploitability: str
    crash_type: str
    cvss_estimate: float
    analysis: Dict
    
    # Generated artifacts
    exploit_code: Optional[str]
crash_id
str
Unique crash identifier (SHA256 hash)
signal
str
Signal that caused crash (SIGSEGV, SIGABRT, SIGILL, etc.)
stack_trace
str
Full stack trace from debugger
registers
Dict[str, str]
CPU register values at crash (PC, SP, etc.)
crash_instruction
str
Disassembly of crashing instruction
exploitability
str
Exploitability estimate: “exploitable”, “likely”, “unlikely”, “not_exploitable”
crash_type
str
Crash classification: “heap_overflow”, “stack_overflow”, “null_deref”, “use_after_free”, etc.

GDBDebugger

GDB integration for Linux/Windows binaries.
from packages.binary_analysis import GDBDebugger

debugger = GDBDebugger(binary_path=Path("target"))

# Get stack trace
stack_trace = debugger.get_stack_trace(
    input_file=Path("crash_input")
)

# Get register dump
registers = debugger.get_registers(
    input_file=Path("crash_input")
)

print(f"PC: {registers['pc']}")
print(f"SP: {registers['sp']}")

Analysis Workflow

Complete Crash Triage

from pathlib import Path
from packages.binary_analysis import CrashAnalyser
import json

analyser = CrashAnalyser(Path("target_binary"))

# Analyze multiple crashes
crash_dir = Path("crashes/")
results = []

for crash_file in crash_dir.glob("id:*"):
    # Extract signal from filename: id:000000,sig:11
    signal_part = crash_file.name.split(",sig:")[1] if ",sig:" in crash_file.name else "11"
    signal_map = {"11": "SIGSEGV", "6": "SIGABRT", "4": "SIGILL"}
    signal = signal_map.get(signal_part, "SIGSEGV")
    
    # Analyze crash
    context = analyser.analyze_crash(
        input_file=crash_file,
        signal=signal
    )
    
    results.append({
        "crash_id": context.crash_id,
        "input": str(crash_file),
        "signal": signal,
        "type": context.crash_type,
        "exploitability": context.exploitability,
        "cvss": context.cvss_estimate,
        "stack_hash": context.stack_hash
    })

# Sort by exploitability
results.sort(key=lambda x: {
    "exploitable": 4,
    "likely": 3,
    "unlikely": 2,
    "not_exploitable": 1
}.get(x["exploitability"], 0), reverse=True)

# Save report
with open("crash_triage.json", "w") as f:
    json.dump(results, f, indent=2)

print(f"Analyzed {len(results)} crashes")
print(f"Exploitable: {sum(1 for r in results if r['exploitability'] == 'exploitable')}")

Debugger Support

Automatic Detection

The analyser automatically selects the appropriate debugger:
# On Linux: uses GDB
# On macOS: uses LLDB (for Mach-O binaries)
# On Windows: uses GDB (if available)

analyser = CrashAnalyser(binary_path)
# Debugger auto-detected based on platform and binary type

GDB Commands

# Internally uses GDB batch mode
commands = [
    "run < input_file",
    "bt full",           # Full backtrace
    "info registers",    # Register dump
    "x/i $pc",          # Crash instruction
    "info frame",       # Frame info
]

LLDB Commands (macOS)

# For macOS binaries
commands = [
    "process launch -i input_file",
    "bt all",           # Backtrace
    "register read",    # Registers
    "disassemble -p",  # Crash instruction
]

Crash Classification

Automatic Classification

The analyser classifies crashes based on:
  1. Signal type: SIGSEGV, SIGABRT, SIGILL, SIGFPE
  2. Crash address: NULL, heap, stack, code regions
  3. Instruction pattern: Write, read, execute, call
  4. Stack trace patterns: malloc/free, memcpy, strcpy

Crash Types

TypeSignalExploitabilityDescription
heap_overflowSIGSEGVHighHeap buffer overflow
stack_overflowSIGSEGVHighStack buffer overflow
use_after_freeSIGSEGVHighUse-after-free
double_freeSIGABRTMediumDouble free
null_derefSIGSEGVLowNULL pointer dereference
integer_overflowSIGFPEMediumInteger overflow
format_stringSIGSEGVHighFormat string vulnerability

Exploitability Assessment

Heuristic Analysis

context = analyser.analyze_crash(input_file, signal)

# Exploitability factors:
# - Crash type (heap overflow = high)
# - Write vs read (write = higher)
# - Control over crash address (PC control = exploitable)
# - ASAN detection (heap bugs = exploitable)
# - Stack trace patterns (strcpy/memcpy = likely)

if context.exploitability == "exploitable":
    print("High confidence exploitable vulnerability")
elif context.exploitability == "likely":
    print("Likely exploitable with some effort")

Exploitability Levels

LevelCriteriaAction
exploitablePC control, heap overflow, format stringImmediate analysis
likelyStack overflow, UAF, double freeDeep analysis
unlikelyNULL deref, assert failureLow priority
not_exploitableTimeout, OOM, intentional abortSkip

Binary Information

Extract Binary Details

context = analyser.analyze_crash(input_file, signal)

print("Binary Information:")
for key, value in context.binary_info.items():
    print(f"  {key}: {value}")

# Example output:
# arch: x86_64
# os: Linux
# stripped: False
# nx: True
# pie: True
# relro: Full
# canary: True
# asan: False

Check Security Features

from packages.exploit_feasibility import analyze_binary

# Deep binary analysis
result = analyze_binary("/path/to/binary")

print(f"PIE: {result['mitigations']['pie_enabled']}")
print(f"Canary: {result['mitigations']['stack_canary']}")
print(f"NX: {result['mitigations']['nx_enabled']}")
print(f"RELRO: {result['mitigations']['relro']}")

Symbol Table

Address-to-Function Mapping

# Symbol table automatically cached
analyser = CrashAnalyser(binary_path)

# Resolve crash address to function
function = analyser._symbol_cache.get("0x401234")
print(f"Crashed in function: {function}")

Source Location

# If debug symbols available
if context.source_location:
    print(f"Crash location: {context.source_location}")
    # Example: "src/parser.c:145"

Stack Hashing

Deduplication

# Crashes with same stack_hash are duplicates
crashes = []
for input_file in crash_files:
    context = analyser.analyze_crash(input_file, "SIGSEGV")
    crashes.append(context)

# Deduplicate by stack hash
unique_crashes = {}
for crash in crashes:
    if crash.stack_hash not in unique_crashes:
        unique_crashes[crash.stack_hash] = crash

print(f"Total crashes: {len(crashes)}")
print(f"Unique crashes: {len(unique_crashes)}")

Configuration

Timeouts

# Adjust crash analysis timeout
context = analyser.analyze_crash(
    input_file=crash_file,
    signal="SIGSEGV",
    timeout=60  # 60 seconds (default: 30)
)

Available Tools

The analyser checks for these tools:
  • nm - Symbol table extraction
  • addr2line - Address to source resolution
  • objdump - Disassembly
  • readelf - ELF header analysis (Linux)
  • file - File type identification
  • strings - String extraction

Output Structure

out/crashes/
├── crash_abc123/
│   ├── input                  # Crashing input
│   ├── stack_trace.txt        # Full stack trace
│   ├── registers.txt          # Register dump
│   ├── disassembly.txt        # Crash instruction disassembly
│   ├── context.json           # CrashContext serialized
│   └── analysis.json          # LLM analysis (if available)
└── crash_def456/

Integration

With Fuzzing

from packages.fuzzing import AFLRunner, CrashCollector
from packages.binary_analysis import CrashAnalyser

# 1. Fuzz
runner = AFLRunner(...)
runner.run_parallel_fuzzers(...)

# 2. Collect crashes
collector = CrashCollector(...)
crashes = collector.collect_crashes(...)

# 3. Analyze
analyser = CrashAnalyser(...)
for crash in crashes:
    context = analyser.analyze_crash(crash.input_file, crash.signal)

With LLM Analysis

from packages.llm_analysis import AutonomousSecurityAgentV2

# Binary analysis -> LLM analysis
context = analyser.analyze_crash(...)

agent = AutonomousSecurityAgentV2(...)
exploit = agent.generate_exploit_for_crash(context)
patch = agent.generate_patch_for_crash(context)

Performance

Analysis Speed

  • Per crash: 5-15 seconds
  • With symbols: +2-5 seconds
  • With ASAN: +3-8 seconds (richer info)

Deduplication

  • Stack hashing: < 1 second per crash
  • Typical deduplication: 100 crashes → 10-20 unique

Best Practices

  1. Compile with symbols (-g) for better analysis
  2. Enable ASAN for heap bug detection
  3. Prioritize by exploitability (focus on “exploitable” first)
  4. Deduplicate early to avoid redundant analysis
  5. Use LLM analysis for complex crashes

Build docs developers (and LLMs) love