Binary Analysis

Overview

The Binary Analysis package provides comprehensive crash analysis capabilities for binary executables. It extracts exploitability information using debuggers (GDB/LLDB), disassembly, and LLM-powered analysis.

Purpose

Analyze crashes and binaries with:

Crash analysis: Extract stack traces, registers, crash instructions
Debugger integration: GDB (Linux/Windows) and LLDB (macOS)
Disassembly: Function-level code analysis
Exploitability assessment: Automated triage
Symbol table extraction: Address-to-function mapping

Architecture

packages/binary_analysis/
├── crash_analyser.py      # Main crash analysis engine
└── debugger.py            # GDB/LLDB integration

Quick Start

Basic Crash Analysis

from pathlib import Path
from packages.binary_analysis import CrashAnalyser

# Initialize analyzer
analyser = CrashAnalyser(
    binary_path=Path("/path/to/binary")
)

# Analyze a crash
context = analyser.analyze_crash(
    input_file=Path("crashes/crash_001"),
    signal="SIGSEGV"
)

print(f"Crash Type: {context.crash_type}")
print(f"Exploitability: {context.exploitability}")
print(f"Stack Trace:\n{context.stack_trace}")
print(f"Registers: {context.registers}")

With LLM Analysis

from packages.binary_analysis import CrashAnalyser
from packages.llm_analysis import AutonomousSecurityAgentV2

# 1. Extract crash context
analyser = CrashAnalyser(Path("target_binary"))
context = analyser.analyze_crash(
    input_file=Path("crash_input"),
    signal="SIGSEGV"
)

# 2. LLM analysis for deeper understanding
agent = AutonomousSecurityAgentV2(
    repo_path=Path("."),
    out_dir=Path("out/analysis")
)

exploit = agent.generate_exploit_for_crash(context)
print(f"Generated exploit:\n{exploit}")

Core Classes

CrashAnalyser

Main crash analysis engine.

class CrashAnalyser:
    def __init__(self, binary_path: Path)
    
    def analyze_crash(
        self,
        input_file: Path,
        signal: str,
        timeout: int = 30
    ) -> CrashContext
    
    def get_stack_trace(
        self,
        input_file: Path
    ) -> str
    
    def get_registers(
        self,
        input_file: Path
    ) -> Dict[str, str]
    
    def get_crash_instruction(
        self,
        address: str
    ) -> str
    
    def estimate_exploitability(
        self,
        context: CrashContext
    ) -> str

binary_path

Path

required

Path to target binary executable

CrashContext

Complete context for a crash.

@dataclass
class CrashContext:
    crash_id: str
    binary_path: Path
    input_file: Path
    signal: str
    
    # From debugger
    stack_trace: str
    registers: Dict[str, str]
    crash_instruction: str
    crash_address: str
    stack_hash: str
    
    # From disassembly
    disassembly: str
    function_name: str
    source_location: str
    
    # Binary information
    binary_info: Dict[str, str]
    
    # Analysis results
    exploitability: str
    crash_type: str
    cvss_estimate: float
    analysis: Dict
    
    # Generated artifacts
    exploit_code: Optional[str]

crash_id

str

Unique crash identifier (SHA256 hash)

signal

str

Signal that caused crash (SIGSEGV, SIGABRT, SIGILL, etc.)

stack_trace

str

Full stack trace from debugger

registers

Dict[str, str]

CPU register values at crash (PC, SP, etc.)

crash_instruction

str

Disassembly of crashing instruction

exploitability

str

Exploitability estimate: “exploitable”, “likely”, “unlikely”, “not_exploitable”

crash_type

str

Crash classification: “heap_overflow”, “stack_overflow”, “null_deref”, “use_after_free”, etc.

GDBDebugger

GDB integration for Linux/Windows binaries.

from packages.binary_analysis import GDBDebugger

debugger = GDBDebugger(binary_path=Path("target"))

# Get stack trace
stack_trace = debugger.get_stack_trace(
    input_file=Path("crash_input")
)

# Get register dump
registers = debugger.get_registers(
    input_file=Path("crash_input")
)

print(f"PC: {registers['pc']}")
print(f"SP: {registers['sp']}")

Analysis Workflow

Complete Crash Triage

from pathlib import Path
from packages.binary_analysis import CrashAnalyser
import json

analyser = CrashAnalyser(Path("target_binary"))

# Analyze multiple crashes
crash_dir = Path("crashes/")
results = []

for crash_file in crash_dir.glob("id:*"):
    # Extract signal from filename: id:000000,sig:11
    signal_part = crash_file.name.split(",sig:")[1] if ",sig:" in crash_file.name else "11"
    signal_map = {"11": "SIGSEGV", "6": "SIGABRT", "4": "SIGILL"}
    signal = signal_map.get(signal_part, "SIGSEGV")
    
    # Analyze crash
    context = analyser.analyze_crash(
        input_file=crash_file,
        signal=signal
    )
    
    results.append({
        "crash_id": context.crash_id,
        "input": str(crash_file),
        "signal": signal,
        "type": context.crash_type,
        "exploitability": context.exploitability,
        "cvss": context.cvss_estimate,
        "stack_hash": context.stack_hash
    })

# Sort by exploitability
results.sort(key=lambda x: {
    "exploitable": 4,
    "likely": 3,
    "unlikely": 2,
    "not_exploitable": 1
}.get(x["exploitability"], 0), reverse=True)

# Save report
with open("crash_triage.json", "w") as f:
    json.dump(results, f, indent=2)

print(f"Analyzed {len(results)} crashes")
print(f"Exploitable: {sum(1 for r in results if r['exploitability'] == 'exploitable')}")

Debugger Support

Automatic Detection

The analyser automatically selects the appropriate debugger:

# On Linux: uses GDB
# On macOS: uses LLDB (for Mach-O binaries)
# On Windows: uses GDB (if available)

analyser = CrashAnalyser(binary_path)
# Debugger auto-detected based on platform and binary type

GDB Commands

# Internally uses GDB batch mode
commands = [
    "run < input_file",
    "bt full",           # Full backtrace
    "info registers",    # Register dump
    "x/i $pc",          # Crash instruction
    "info frame",       # Frame info
]

LLDB Commands (macOS)

# For macOS binaries
commands = [
    "process launch -i input_file",
    "bt all",           # Backtrace
    "register read",    # Registers
    "disassemble -p",  # Crash instruction
]

Crash Classification

Automatic Classification

The analyser classifies crashes based on:

Signal type: SIGSEGV, SIGABRT, SIGILL, SIGFPE
Crash address: NULL, heap, stack, code regions
Instruction pattern: Write, read, execute, call
Stack trace patterns: malloc/free, memcpy, strcpy

Crash Types

Type	Signal	Exploitability	Description
`heap_overflow`	SIGSEGV	High	Heap buffer overflow
`stack_overflow`	SIGSEGV	High	Stack buffer overflow
`use_after_free`	SIGSEGV	High	Use-after-free
`double_free`	SIGABRT	Medium	Double free
`null_deref`	SIGSEGV	Low	NULL pointer dereference
`integer_overflow`	SIGFPE	Medium	Integer overflow
`format_string`	SIGSEGV	High	Format string vulnerability

Exploitability Assessment

Heuristic Analysis

context = analyser.analyze_crash(input_file, signal)

# Exploitability factors:
# - Crash type (heap overflow = high)
# - Write vs read (write = higher)
# - Control over crash address (PC control = exploitable)
# - ASAN detection (heap bugs = exploitable)
# - Stack trace patterns (strcpy/memcpy = likely)

if context.exploitability == "exploitable":
    print("High confidence exploitable vulnerability")
elif context.exploitability == "likely":
    print("Likely exploitable with some effort")

Exploitability Levels

Level	Criteria	Action
`exploitable`	PC control, heap overflow, format string	Immediate analysis
`likely`	Stack overflow, UAF, double free	Deep analysis
`unlikely`	NULL deref, assert failure	Low priority
`not_exploitable`	Timeout, OOM, intentional abort	Skip

Binary Information

Extract Binary Details

context = analyser.analyze_crash(input_file, signal)

print("Binary Information:")
for key, value in context.binary_info.items():
    print(f"  {key}: {value}")

# Example output:
# arch: x86_64
# os: Linux
# stripped: False
# nx: True
# pie: True
# relro: Full
# canary: True
# asan: False

Check Security Features

from packages.exploit_feasibility import analyze_binary

# Deep binary analysis
result = analyze_binary("/path/to/binary")

print(f"PIE: {result['mitigations']['pie_enabled']}")
print(f"Canary: {result['mitigations']['stack_canary']}")
print(f"NX: {result['mitigations']['nx_enabled']}")
print(f"RELRO: {result['mitigations']['relro']}")

Symbol Table

Address-to-Function Mapping

# Symbol table automatically cached
analyser = CrashAnalyser(binary_path)

# Resolve crash address to function
function = analyser._symbol_cache.get("0x401234")
print(f"Crashed in function: {function}")

Source Location

# If debug symbols available
if context.source_location:
    print(f"Crash location: {context.source_location}")
    # Example: "src/parser.c:145"

Stack Hashing

Deduplication

# Crashes with same stack_hash are duplicates
crashes = []
for input_file in crash_files:
    context = analyser.analyze_crash(input_file, "SIGSEGV")
    crashes.append(context)

# Deduplicate by stack hash
unique_crashes = {}
for crash in crashes:
    if crash.stack_hash not in unique_crashes:
        unique_crashes[crash.stack_hash] = crash

print(f"Total crashes: {len(crashes)}")
print(f"Unique crashes: {len(unique_crashes)}")

Configuration

Timeouts

# Adjust crash analysis timeout
context = analyser.analyze_crash(
    input_file=crash_file,
    signal="SIGSEGV",
    timeout=60  # 60 seconds (default: 30)
)

Available Tools

The analyser checks for these tools:

nm - Symbol table extraction
addr2line - Address to source resolution
objdump - Disassembly
readelf - ELF header analysis (Linux)
file - File type identification
strings - String extraction

Output Structure

out/crashes/
├── crash_abc123/
│   ├── input                  # Crashing input
│   ├── stack_trace.txt        # Full stack trace
│   ├── registers.txt          # Register dump
│   ├── disassembly.txt        # Crash instruction disassembly
│   ├── context.json           # CrashContext serialized
│   └── analysis.json          # LLM analysis (if available)
└── crash_def456/

Integration

With Fuzzing

from packages.fuzzing import AFLRunner, CrashCollector
from packages.binary_analysis import CrashAnalyser

# 1. Fuzz
runner = AFLRunner(...)
runner.run_parallel_fuzzers(...)

# 2. Collect crashes
collector = CrashCollector(...)
crashes = collector.collect_crashes(...)

# 3. Analyze
analyser = CrashAnalyser(...)
for crash in crashes:
    context = analyser.analyze_crash(crash.input_file, crash.signal)

With LLM Analysis

from packages.llm_analysis import AutonomousSecurityAgentV2

# Binary analysis -> LLM analysis
context = analyser.analyze_crash(...)

agent = AutonomousSecurityAgentV2(...)
exploit = agent.generate_exploit_for_crash(context)
patch = agent.generate_patch_for_crash(context)

Fuzzing - Generate crashes to analyze
LLM Analysis - AI-powered crash analysis
Exploitability Validation - Validate exploitability

Performance

Analysis Speed

Per crash: 5-15 seconds
With symbols: +2-5 seconds
With ASAN: +3-8 seconds (richer info)

Deduplication

Stack hashing: < 1 second per crash
Typical deduplication: 100 crashes → 10-20 unique

Best Practices

Compile with symbols (-g) for better analysis
Enable ASAN for heap bug detection
Prioritize by exploitability (focus on “exploitable” first)
Deduplicate early to avoid redundant analysis
Use LLM analysis for complex crashes

Commands

Packages

Agents

Expert Personas

​Overview

​Purpose

​Architecture

​Quick Start

​Basic Crash Analysis

​With LLM Analysis

​Core Classes

​CrashAnalyser

​CrashContext

​GDBDebugger

​Analysis Workflow

​Complete Crash Triage

​Debugger Support

​Automatic Detection

​GDB Commands

​LLDB Commands (macOS)

​Crash Classification

​Automatic Classification

​Crash Types

​Exploitability Assessment

​Heuristic Analysis

​Exploitability Levels

​Binary Information

​Extract Binary Details

​Check Security Features

​Symbol Table

​Address-to-Function Mapping

​Source Location

​Stack Hashing

​Deduplication

​Configuration

​Timeouts

​Available Tools

​Output Structure

​Integration

​With Fuzzing

​With LLM Analysis

​Related Packages

​Performance

​Analysis Speed

​Deduplication

​Best Practices

Build docs developers (and LLMs) love

Overview

Purpose

Architecture

Quick Start

Basic Crash Analysis

With LLM Analysis

Core Classes

CrashAnalyser

CrashContext

GDBDebugger

Analysis Workflow

Complete Crash Triage

Debugger Support

Automatic Detection

GDB Commands

LLDB Commands (macOS)

Crash Classification

Automatic Classification

Crash Types

Exploitability Assessment

Heuristic Analysis

Exploitability Levels

Binary Information

Extract Binary Details

Check Security Features

Symbol Table

Address-to-Function Mapping

Source Location

Stack Hashing

Deduplication

Configuration

Timeouts

Available Tools

Output Structure

Integration

With Fuzzing

With LLM Analysis

Related Packages

Performance

Analysis Speed

Deduplication

Best Practices