Binary Crash Analysis

Overview

RAPTOR provides autonomous crash analysis for C/C++ binaries using debugger integration and deterministic replay. The system extracts crash context, classifies vulnerability types, and assesses exploitability.

Crash analysis combines debugger traces, disassembly, memory layout analysis, and symbol resolution for comprehensive crash understanding.

Features

Multi-Debugger Support: Automatic detection of GDB or LLDB
ASan Integration: Enhanced diagnostics for sanitizer-instrumented binaries
rr Replay: Deterministic debugging with reverse execution
Function Tracing: Call trace visualization with Perfetto
Crash Classification: Automatic vulnerability type detection
Memory Analysis: Region identification and protection status

Crash Analysis Workflow

Crash Context Extraction

The analyzer extracts comprehensive crash information:

@dataclass
class CrashContext:
    """Complete context for a crash."""
    crash_id: str
    binary_path: Path
    input_file: Path
    signal: str
    
    # From debugger
    stack_trace: str = ""
    registers: Dict[str, str] = field(default_factory=dict)
    crash_instruction: str = ""
    crash_address: str = ""
    stack_hash: str = ""  # For deduplication
    
    # From disassembly
    disassembly: str = ""
    function_name: str = "unknown"
    source_location: str = ""  # file:line
    
    # Binary information
    binary_info: Dict[str, str] = field(default_factory=dict)
    
    # Analysis results
    exploitability: str = "unknown"
    crash_type: str = "unknown"
    cvss_estimate: float = 0.0

Debugger Integration

GDB Analysis

For Linux binaries and general debugging:

def _run_gdb_analysis(self, input_file: Path) -> str:
    """Run GDB to analyze crash."""
    gdb_commands = [
        "set pagination off",
        "set confirm off",
        "set print pretty on",
        "handle SIGSEGV stop",
        "handle SIGABRT stop",
        f"run < '{input_file}'",
        "info registers",
        "backtrace full",
        "x/10i $pc",  # Disassemble at crash
        "x/20xw $sp", # Examine stack
        "quit",
    ]
    
    result = subprocess.run(
        ["gdb", "-batch", "-x", cmd_file, str(self.binary)],
        capture_output=True,
        timeout=30
    )
    return result.stdout

LLDB Analysis

For macOS binaries (Mach-O format):

def _run_lldb_analysis(self, input_file: Path) -> str:
    """Run LLDB to analyze crash (macOS)."""
    lldb_commands = [
        "settings set auto-confirm true",
        "process handle SIGSEGV -s true",
        f"process launch -i {input_file}",
        "register read",
        "thread backtrace --extended true",
        "disassemble --count 10 --start-address $pc",
        "memory read --size 4 --format x --count 20 $sp",
        "quit",
    ]
    
    result = subprocess.run(
        ["lldb", "-s", cmd_file, str(self.binary)],
        capture_output=True,
        timeout=60
    )
    return result.stdout

ASan Integration

AddressSanitizer provides superior crash diagnostics - always use ASan builds when available.

Detection

def _detect_asan_binary(self) -> bool:
    """Detect if binary was compiled with AddressSanitizer."""
    result = subprocess.run(
        ["nm", str(self.binary)],
        capture_output=True,
        text=True
    )
    
    asan_symbols = [
        "__asan_", "__sanitizer", 
        "__asan_report", "__asan_handle"
    ]
    
    for symbol in asan_symbols:
        if symbol in result.stdout:
            return True
    return False

ASan Output Parsing

ASan provides detailed diagnostics:

AddressSanitizer: heap-buffer-overflow on address 0x602000000015 at pc 0x00000040123c
READ of size 1 at 0x602000000015 thread T0
    #0 0x40123b in process_data /src/server.c:145
    #1 0x401456 in main /src/server.c:234

0x602000000015 is located 0 bytes to the right of 5-byte region [0x602000000010,0x602000000015)
allocated by thread T0 here:
    #0 0x7f8b4c in malloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10c4c)
    #1 0x40115e in allocate_buffer /src/server.c:89

The analyzer extracts:

Crash type (heap-buffer-overflow)
Access type (READ)
Crash address
Stack trace
Allocation trace

rr Deterministic Debugging

rr provides record-replay debugging with full reverse execution - critical for understanding complex crashes.

Recording a Crash

# Record execution
rr record ./vulnerable_program < crash_input.txt

# Replay with reverse execution
rr replay

Reverse Execution Commands

Once in replay mode (GDB interface):

Regular Crashes
ASan Crashes

# Go back 100 steps from crash
reverse-next 100

# Now step forward to see execution leading to crash
next
next
print buffer
x/20xb buffer

# View stack trace
bt

# Go up to last application frame (before ASan runtime)
up
up
up

# Set breakpoint at that location
break *$pc

# Reverse to last app instruction before ASan
reverse-continue

# Now step forward
next
print *ptr

Automated Trace Extraction

Use the crash trace script:

# Extract execution trace before crash
python scripts/crash_trace.py trace.rr

# Output: detailed execution log with register values

Function Call Tracing

Visualize execution flow with function tracing and Perfetto UI.

Setup

Build Instrumentation Library

gcc -c -fPIC trace_instrument.c -o trace_instrument.o
gcc -shared trace_instrument.o -o libtrace.so -ldl -lpthread

Instrument Binary

Add to build:

CFLAGS += -finstrument-functions -g
LDFLAGS += -L. -ltrace -ldl -lpthread

Run with Tracing

export LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH
./program < crash_input.txt
# Creates trace_<tid>.log files

Convert to Perfetto

./trace_to_perfetto trace_*.log -o trace.json
# Open trace.json at ui.perfetto.dev

Trace Format

[seq] [timestamp] [dots] [ENTRY|EXIT!] function_name
[0] [1.000000000]  [ENTRY] main
[1] [1.000050000] . [ENTRY] process_request
[2] [1.000100000] .. [ENTRY] parse_input
[3] [1.000120000] ... [ENTRY] strcpy    ← Crashes here

Dots indicate call depth - easy to see execution path to crash.

Crash Classification

Automatic classification based on signals and context:

def classify_crash_type(self, context: CrashContext) -> str:
    """Classify crash type based on available information."""
    signal = context.signal.lower()
    
    if signal in ["11", "sigsegv"]:
        # Segmentation fault - analyze further
        memory_region = context.binary_info.get("memory_region", "")
        
        if "heap" in memory_region:
            return "heap_overflow"
        elif "stack" in memory_region:
            return "stack_overflow"
        elif context.crash_address in ["0x0", "0x00000000"]:
            return "null_pointer_dereference"
        else:
            return "memory_access_violation"
    
    elif signal in ["6", "sigabrt"]:
        if context.binary_info.get("asan_enabled") == "true":
            return "asan_detected_bug"
        elif "double free" in context.stack_trace.lower():
            return "double_free"
        else:
            return "abort_signal"
    
    # ... more classifications

Crash Types

Heap Overflow

Buffer overflow in heap-allocated memoryIndicators:

Crash in malloc/free
ASan: heap-buffer-overflow
Memory region: heap

Stack Overflow

Buffer overflow on the stackIndicators:

Crash in strcpy/memcpy
Stack canary detection
Memory region: stack

Use-After-Free

Access to freed memoryIndicators:

ASan: heap-use-after-free
Crash in heap access
Invalid heap metadata

Double Free

Freeing memory twiceIndicators:

SIGABRT in free()
ASan: double-free
Heap corruption

NULL Dereference

Dereferencing NULL pointerIndicators:

SIGSEGV at 0x0
PC at low address
NULL pointer in register

Format String

Format string vulnerabilityIndicators:

Crash in printf family
%n or %s in input
Abnormal format string

Memory Region Analysis

Identify which memory region was accessed:

def _analyze_memory_regions(self, context: CrashContext) -> Dict[str, str]:
    """Analyze memory regions around crash address."""
    crash_addr = int(context.crash_address, 16)
    
    # Null page
    if crash_addr < 0x1000:
        return {
            "memory_region": "null_page",
            "analysis": "Likely NULL pointer dereference"
        }
    
    # Linux mmap region
    elif crash_addr >= 0x7f0000000000:
        return {
            "memory_region": "mmap_region",
            "analysis": "Heap or library memory"
        }
    
    # Common PIE base
    elif crash_addr >= 0x555555554000:
        return {
            "memory_region": "pie_base",
            "analysis": "PIE executable code/data"
        }
    
    # Check proximity to stack
    sp = int(context.registers.get("rsp", "0"), 16)
    if abs(crash_addr - sp) < 0x10000:
        return {
            "memory_region": "stack",
            "relative_to_stack": "near_stack_pointer"
        }

Exploitability Assessment

Assess whether crash is exploitable:

High Exploitability
Medium Exploitability
Low Exploitability

Stack Buffer Overflow:

Overwrites return address
No stack canary
Controlled input size

Heap Overflow:

Overwrites heap metadata
No heap hardening
Predictable allocation pattern

Format String:

%n writes enabled
Attacker controls format string
Known binary base

Best Practices

Always Use ASan Builds

AddressSanitizer provides the best crash diagnostics:

# Build with ASan
gcc -fsanitize=address -g -O1 -fno-omit-frame-pointer source.c

# Run and catch bugs
./program < crash_input.txt

ASan detects:

Buffer overflows (stack and heap)
Use-after-free
Double-free
Memory leaks

Use rr for Complex Crashes

Record-replay helps understand non-deterministic crashes:

# Record once
rr record ./program

# Replay infinitely
rr replay  # Same execution every time

Especially useful for:

Race conditions
Heap corruption
Complex state machines

Generate Core Dumps

Enable core dumps for post-mortem analysis:

# Enable core dumps
ulimit -c unlimited

# Run program
./program

# Analyze core
gdb ./program core

Deduplicate Crashes

Use stack hashes to identify unique crashes:

# Stack hash for deduplication
stack_hash = hashlib.sha256(
    '|'.join(function_names[:10]).encode()
).hexdigest()[:16]

# Group by hash
unique_crashes = defaultdict(list)
for crash in all_crashes:
    unique_crashes[crash.stack_hash].append(crash)

Vulnerability Analysis

LLM-powered security analysis

Exploit Generation

Generate exploit PoCs

Get Started

Core Concepts

Security Testing

Analysis & Exploitation

Advanced Features

Guides

Binary Crash Analysis

Overview

Features

Crash Analysis Workflow

Crash Context Extraction

Debugger Integration

GDB Analysis

LLDB Analysis

ASan Integration

Detection

ASan Output Parsing

rr Deterministic Debugging

Recording a Crash

Reverse Execution Commands

Automated Trace Extraction

Function Call Tracing

Setup

Trace Format

Crash Classification

Crash Types

Heap Overflow

Stack Overflow

Use-After-Free

Double Free

NULL Dereference

Format String

Memory Region Analysis

Exploitability Assessment

Best Practices

See Also

Vulnerability Analysis

Exploit Generation

Build docs developers (and LLMs) love

Get Started

Core Concepts

Security Testing

Analysis & Exploitation

Advanced Features

Guides

​Overview

​Features

​Crash Analysis Workflow

​Crash Context Extraction

​Debugger Integration

​GDB Analysis

​LLDB Analysis

​ASan Integration

​Detection

​ASan Output Parsing

​rr Deterministic Debugging

​Recording a Crash

​Reverse Execution Commands

​Automated Trace Extraction

​Function Call Tracing

​Setup

​Trace Format

​Crash Classification

​Crash Types

Heap Overflow

Stack Overflow

Use-After-Free

Double Free

NULL Dereference

Format String

​Memory Region Analysis

​Exploitability Assessment

​Best Practices

​See Also

Vulnerability Analysis

Exploit Generation

Build docs developers (and LLMs) love

Overview

Features

Crash Analysis Workflow

Crash Context Extraction

Debugger Integration

GDB Analysis

LLDB Analysis

ASan Integration

Detection

ASan Output Parsing

rr Deterministic Debugging

Recording a Crash

Reverse Execution Commands

Automated Trace Extraction

Function Call Tracing

Setup

Trace Format

Crash Classification

Crash Types

Memory Region Analysis

Exploitability Assessment

Best Practices

See Also