Overview
Sleigh is a domain-specific language designed for rapid processor specification. It provides a formal method to describe:- Instruction encoding and decoding
- Assembly language syntax
- Instruction semantics via p-code translation
- Register definitions and address spaces
- Context-dependent instruction behavior
P-Code: Ghidra’s Intermediate Representation
P-code is a low-level intermediate representation that captures the semantics of machine instructions in a processor-independent format.Key Concepts
Address Spaces
Address spaces represent different memory regions and storage areas:- RAM space: Main memory
- Register space: Processor registers
- Unique space: Temporary variables
- Constant space: Immediate values
- User-defined spaces: Custom memory regions
Varnodes
Varnodes are the fundamental data objects in p-code:- Represent values at specific locations
- Defined by address space, offset, and size
- Can reference registers, memory, or temporary values
RSP register = varnode in register space at specific offset
Operations
P-code uses a RISC-like instruction set with operations like:- Arithmetic:
INT_ADD,INT_SUB,INT_MULT,INT_DIV - Logic:
INT_AND,INT_OR,INT_XOR,INT_NOT - Comparison:
INT_EQUAL,INT_LESS,INT_SLESS - Memory:
LOAD,STORE - Control flow:
BRANCH,CBRANCH,CALL,RETURN - Floating-point:
FLOAT_ADD,FLOAT_MULT, etc.
Sleigh Specification Structure
A Sleigh specification (.slaspec file) consists of several key sections:
1. Basic Definitions
2. Register Definitions
Registers are defined with names, locations, and sizes:3. Token Definitions
Tokens describe how instruction bytes are parsed:4. Constructors
Constructors are the heart of Sleigh specifications. Each constructor describes:- Display section: Assembly syntax
- Bit pattern: Instruction encoding
- Semantic section: P-code translation
:ADD reg1, reg2- Display pattern (assembly syntax)is op=0x01 & reg1 & reg2- Bit pattern matching{ reg1 = reg1 + reg2; }- Semantic action (p-code)
5. Tables
Tables organize constructors into logical groups:- Root table: Top-level instruction table
- Subtables: Operand types, addressing modes, etc.
Advanced Features
Context Variables
Context variables handle mode changes and conditional instruction decoding:- ARM/Thumb mode switching
- 16-bit vs 32-bit instruction decoding
- Conditional execution
Macros
P-code macros encapsulate common operations:Build Directives
Control instruction assembly:Delay Slot Directives
Handle delayed branches (MIPS, SPARC):Preprocessing
Sleigh supports preprocessing directives:File Inclusion
Macros and Conditionals
Real-World Examples
x86 Processor
Location:Ghidra/Processors/x86/data/languages/
The x86 specification uses modular includes:
ia.sinc- Base IA-32 instruction setavx.sinc,avx2.sinc- SIMD extensionsbmi1.sinc,bmi2.sinc- Bit manipulationsha.sinc- SHA extensionsmpx.sinc- Memory Protection Extensions
ARM Processor
Location:Ghidra/Processors/ARM/data/languages/
ARM specifications handle:
- Multiple architecture versions (ARMv4-ARMv8)
- ARM/Thumb mode switching via context
- Conditional execution on most instructions
- NEON SIMD extensions
- Both endianness variants
RISC-V
Location:Ghidra/Processors/RISCV/data/languages/
RISC-V demonstrates:
- Clean RISC instruction encoding
- Modular extension support (M, A, F, D, C)
- 32-bit and 64-bit variants
- Compressed instruction set (C extension)
Dalvik (Android)
Location:Ghidra/Processors/Dalvik/data/languages/
Dalvik shows virtual machine instruction handling:
- Base specification:
Dalvik_Base.slaspec - Version-specific variants (KitKat through Android 12)
- Bytecode instruction semantics
- Register-based VM operations
Sleigh Compilation
Sleigh specifications are compiled into.sla files:
- Sleigh compiler parses
.slaspecfiles - Resolves includes and macros
- Validates constructor patterns
- Generates decision trees for instruction matching
- Produces binary
.slaoutput
Language Definition Files
.ldefs Files
XML files that define language properties:.pspec Files
Processor specifications define:- Memory organization
- Default memory spaces
- Context register definitions
- Prototype evaluators
- Calling conventions
.cspec Files
Compiler specifications define:- Calling conventions
- Stack frame layout
- Parameter passing
- Return value handling
- Register usage conventions
.opinion Files
Loader opinion files map binary formats to processors:Creating Custom Processors
To add a new processor to Ghidra:1. Create Directory Structure
2. Write Sleigh Specification
Define the instruction set in.slaspec:
- Endianness and alignment
- Address spaces
- Register definitions
- Token fields
- Instruction constructors
3. Define Language Properties
Create.ldefs file specifying:
- Processor name and variant
- Endianness and word size
- Sleigh file reference
- Compiler specifications
4. Specify Processor Details
Create.pspec file for:
- Memory maps
- Context registers
- Default assumptions
5. Add Compiler Specifications
Create.cspec files for:
- Calling conventions (cdecl, stdcall, etc.)
- Register roles (parameters, returns, preserved)
- Stack pointer conventions
6. Configure Loader Opinions
Create.opinion file to map:
- Binary formats to your processor
- Default compiler specs
- Architecture detection rules
Best Practices
Modular Design
- Use
@includefor instruction set extensions - Separate base ISA from optional features
- Share common patterns via includes
Clear Naming
- Use descriptive constructor names
- Follow assembly syntax conventions
- Document complex patterns
Testing
- Test with real binary samples
- Verify disassembly accuracy
- Validate p-code generation
- Check decompiler output
Performance
- Minimize constructor ambiguity
- Order patterns from specific to general
- Use context efficiently
Documentation Resources
Official Ghidra Documentation
Location:GhidraDocs/languages/html/
- sleigh.html - Complete Sleigh manual
- sleigh_constructors.html - Constructor syntax
- sleigh_tokens.html - Token definitions
- sleigh_context.html - Context variables
- pcoderef.html - P-code reference
- pcodedescription.html - Detailed p-code operations
Key Sections
- Introduction to P-Code - Fundamental concepts
- Basic Specification Layout - File structure
- Preprocessing - Includes and macros
- Basic Definitions - Endianness, spaces, registers
- Tokens and Fields - Instruction parsing
- Constructors - Pattern matching and semantics
- Using Context - Mode-dependent behavior
- P-code Tables - Operation reference
Common Patterns
Instruction Variants
Conditional Execution
Addressing Modes
Debugging Sleigh Specifications
Common Issues
- Pattern Conflicts: Multiple constructors match same bits
- Missing Context: Context not properly set for mode switches
- Incorrect Semantics: P-code doesn’t match instruction behavior
- Field Overlap: Token fields defined incorrectly
Tools
- Ghidra’s Sleigh compiler error messages
- Disassembly testing in Ghidra
- P-code viewer in decompiler
- Instruction pattern debugger
Related Documentation
- Supported Architectures
- File Format Support
- Ghidra API documentation
- P-code emulator documentation
