Overview
The compiler module converts an Abstract Syntax Tree (AST) into executable bytecode. It uses a two-pass compilation strategy:
- First Pass: Collect label addresses and build symbol table
- Second Pass: Emit bytecode with resolved label references
This approach allows forward references where labels can be used before they are defined.
Public API
assemble
pub fn assemble(source: &str) -> Result<Vec<u8>>
Main entry point for assembling source code into bytecode. This function performs lexing, parsing, and compilation in a single call.
Assembly source code as a string
Compiled bytecode or an error if assembly fails
Example:
use minichain_assembler::assemble;
let source = r#"
.entry main
main:
LOADI R0, 42
HALT
"#;
let bytecode = assemble(source)?;
assemble_with_ast
pub fn assemble_with_ast(source: &str) -> Result<(Program, Vec<u8>)>
Assemble source code and return both the parsed AST and compiled bytecode. Useful for debugging or when you need access to the program structure.
Assembly source code as a string
Result
Result<(Program, Vec<u8>)>
Tuple of (parsed program, compiled bytecode) or an error
Compiler API
Compiler::new
Creates a new compiler instance with empty symbol and constant tables.
Compiler::compile
pub fn compile(program: &Program) -> Result<Vec<u8>>
Compiles a program AST to bytecode.
The parsed AST program to compile
Compiled bytecode as a byte vector, or compilation error
Opcode Reference
The compiler emits the following opcodes:
Control Flow (0x00-0x0F)
Unconditional jump (1 register)
Conditional jump (2 registers)
Call subroutine (1 register)
Arithmetic (0x10-0x1F)
Subtraction (3 registers)
Multiplication (3 registers)
Add immediate (2 registers + u64 immediate)
Bitwise (0x20-0x2F)
Bitwise AND (3 registers)
Bitwise XOR (3 registers)
Bitwise NOT (2 registers)
Shift right (3 registers)
Comparison (0x30-0x3F)
Greater than (3 registers)
Less than or equal (3 registers)
Greater than or equal (3 registers)
Is zero check (2 registers)
Memory (0x40-0x4F)
Load 8-bit from memory (2 registers)
Load 64-bit from memory (2 registers)
Store 8-bit to memory (2 registers)
Store 64-bit to memory (2 registers)
Get memory size (1 register)
Copy memory (3 registers)
Storage (0x50-0x5F)
Load from storage (2 registers)
Store to storage (2 registers)
Load immediate (1 register + u64 immediate)
Move between registers (2 registers)
Context (0x80-0x8F)
Get caller address (1 register)
Get call value (1 register)
Get current address (1 register)
Get block number (1 register)
Get timestamp (1 register)
Get remaining gas (1 register)
Debug (0xF0-0xFF)
Register Encoding
Registers are encoded in 4-bit nibbles (0-15 for R0-R15):
- Single register: Stored in upper nibble, lower nibble is 0
- Two registers: Packed as
(reg1 << 4) | reg2
- Three registers: First two packed in byte 1, third in upper nibble of byte 2
Examples
// Single register: JUMP R5
// Bytecode: [0x02, 0x50]
// [JUMP, R5 << 4]
// Two registers: MOV R0, R1
// Bytecode: [0x71, 0x01]
// [MOV, (R0 << 4) | R1]
// Three registers: ADD R2, R0, R1
// Bytecode: [0x10, 0x20, 0x10]
// [ADD, (R2 << 4) | R0, R1 << 4]
Immediate values are encoded as 64-bit little-endian integers:
// LOADI R0, 255
// Bytecode: [0x70, 0x00, 0xFF, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00]
// [LOADI, R0, value as 8 bytes LE]
Error Types
CompileError
pub enum CompileError {
UndefinedLabel(String),
DuplicateLabel { label: String, first_addr: u64 },
InvalidRegister(u8),
}
Label referenced but never defined
Label defined multiple times at different addresses
Register number outside valid range (0-15)
Usage Examples
Basic Compilation
use minichain_assembler::parser::Parser;
use minichain_assembler::compiler::Compiler;
let source = r#"
LOADI R0, 10
HALT
"#;
let program = Parser::parse(source).unwrap();
let bytecode = Compiler::compile(&program).unwrap();
// LOADI R0, 10: [0x70, 0x00, 0x0A, 0x00, ...]
// HALT: [0x00]
assert_eq!(bytecode[0], 0x70); // LOADI opcode
assert_eq!(bytecode[1], 0x00); // R0
assert_eq!(bytecode[2], 10); // immediate value
assert_eq!(bytecode[10], 0x00); // HALT
Label Resolution
use minichain_assembler::parser::Parser;
use minichain_assembler::compiler::Compiler;
let source = r#"
.entry main
main:
LOADI R0, 10
LOADI R5, loop_end
JUMP R5
loop_end:
HALT
"#;
let program = Parser::parse(source).unwrap();
let bytecode = Compiler::compile(&program).unwrap();
// Labels are resolved to addresses during compilation
assert!(!bytecode.is_empty());
Forward References
use minichain_assembler::parser::Parser;
use minichain_assembler::compiler::Compiler;
let source = r#"
LOADI R5, end ; Forward reference
JUMP R5
ADD R0, R1, R2
end:
HALT
"#;
let program = Parser::parse(source).unwrap();
let bytecode = Compiler::compile(&program).unwrap();
// Forward references work due to two-pass compilation
assert!(!bytecode.is_empty());
Constants
use minichain_assembler::parser::Parser;
use minichain_assembler::compiler::Compiler;
let source = r#"
.const MAX_VALUE 100
LOADI R0, MAX_VALUE
HALT
"#;
let program = Parser::parse(source).unwrap();
let bytecode = Compiler::compile(&program).unwrap();
// Constant is resolved to its value
assert_eq!(bytecode[0], 0x70); // LOADI
assert_eq!(bytecode[1], 0x00); // R0
assert_eq!(bytecode[2], 100); // MAX_VALUE
Three-Register Instruction
use minichain_assembler::parser::Parser;
use minichain_assembler::compiler::Compiler;
let source = "ADD R2, R0, R1";
let program = Parser::parse(source).unwrap();
let bytecode = Compiler::compile(&program).unwrap();
// ADD R2, R0, R1
// Byte 0: 0x10 (ADD opcode)
// Byte 1: 0x20 (R2 in upper nibble, R0 in lower nibble)
// Byte 2: 0x10 (R1 in upper nibble)
assert_eq!(bytecode, vec![0x10, 0x20, 0x10]);
Error Handling: Undefined Label
use minichain_assembler::parser::Parser;
use minichain_assembler::compiler::{Compiler, CompileError};
let source = "LOADI R5, undefined_label";
let program = Parser::parse(source).unwrap();
let result = Compiler::compile(&program);
match result {
Err(CompileError::UndefinedLabel(label)) => {
assert_eq!(label, "undefined_label");
}
_ => panic!("Expected undefined label error"),
}
Error Handling: Duplicate Label
use minichain_assembler::parser::Parser;
use minichain_assembler::compiler::{Compiler, CompileError};
let source = r#"
main:
HALT
main:
HALT
"#;
let program = Parser::parse(source).unwrap();
let result = Compiler::compile(&program);
match result {
Err(CompileError::DuplicateLabel { label, first_addr }) => {
assert_eq!(label, "main");
assert_eq!(first_addr, 0);
}
_ => panic!("Expected duplicate label error"),
}
Complete Example
use minichain_assembler::parser::Parser;
use minichain_assembler::compiler::Compiler;
let source = r#"
.entry main
.const ITERATIONS 10
main:
LOADI R0, 0 ; counter = 0
LOADI R1, ITERATIONS ; max = 10
LOADI R5, loop_body ; load loop address
loop_body:
ADDI R0, R0, 1 ; counter++
LT R2, R0, R1 ; counter < max?
LOADI R5, loop_body ; reload loop address
JUMPI R2, R5 ; jump if condition true
HALT ; exit
"#;
let program = Parser::parse(source).unwrap();
let bytecode = Compiler::compile(&program).unwrap();
println!("Compiled {} bytes", bytecode.len());
println!("Bytecode: {:02X?}", bytecode);
Compilation Pipeline
First Pass: Symbol Collection
- Iterate through all statements
- Track current bytecode address
- Record label addresses in symbol table
- Store constant values
- Validate no duplicate labels
- Calculate instruction sizes for address tracking
Second Pass: Bytecode Emission
- Iterate through all statements again
- Skip labels and directives (non-code)
- For each instruction:
- Emit opcode byte
- Encode register operands
- Resolve label references to addresses
- Emit immediate values in little-endian
- Return complete bytecode vector
Each instruction follows a consistent format:
- 1 byte: Opcode
- 0-2 bytes: Register operands (packed in nibbles)
- 0-8 bytes: Immediate value (64-bit little-endian)
Size Examples
HALT: 1 byte total
JUMP R5: 2 bytes total
MOV R0, R1: 2 bytes total
ADD R2, R0, R1: 3 bytes total
LOADI R0, 255: 10 bytes total (1 opcode + 1 register + 8 immediate)
ADDI R0, R1, 10: 10 bytes total (1 opcode + 1 packed registers + 8 immediate)
Symbol Table
The compiler maintains two tables:
Label Table
Maps label names to bytecode addresses:
HashMap<String, u64>
// Example:
// "main" -> 0
// "loop_body" -> 20
// "end" -> 45
Constant Table
Maps constant names to values:
HashMap<String, u64>
// Example:
// "MAX_VALUE" -> 100
// "ITERATIONS" -> 10
When resolving a LoadILabel instruction, the compiler checks both tables.