Compiler - Minichain

Overview

The compiler module converts an Abstract Syntax Tree (AST) into executable bytecode. It uses a two-pass compilation strategy:

First Pass: Collect label addresses and build symbol table
Second Pass: Emit bytecode with resolved label references

This approach allows forward references where labels can be used before they are defined.

Public API

assemble

pub fn assemble(source: &str) -> Result<Vec<u8>>

Main entry point for assembling source code into bytecode. This function performs lexing, parsing, and compilation in a single call.

source

&str

required

Assembly source code as a string

Result

Result<Vec<u8>>

Compiled bytecode or an error if assembly fails

Example:

use minichain_assembler::assemble;

let source = r#"
    .entry main
    main:
        LOADI R0, 42
        HALT
"#;

let bytecode = assemble(source)?;

assemble_with_ast

pub fn assemble_with_ast(source: &str) -> Result<(Program, Vec<u8>)>

Assemble source code and return both the parsed AST and compiled bytecode. Useful for debugging or when you need access to the program structure.

source

&str

required

Assembly source code as a string

Result

Result<(Program, Vec<u8>)>

Tuple of (parsed program, compiled bytecode) or an error

Compiler API

Compiler::new

pub fn new() -> Self

Creates a new compiler instance with empty symbol and constant tables.

Compiler

A new compiler instance

Compiler::compile

pub fn compile(program: &Program) -> Result<Vec<u8>>

Compiles a program AST to bytecode.

program

&Program

required

The parsed AST program to compile

Result

Result<Vec<u8>>

Compiled bytecode as a byte vector, or compilation error

Opcode Reference

The compiler emits the following opcodes:

Control Flow (0x00-0x0F)

HALT

0x00

Halt execution

NOP

0x01

No operation

JUMP

0x02

Unconditional jump (1 register)

JUMPI

0x03

Conditional jump (2 registers)

CALL

0x04

Call subroutine (1 register)

RET

0x05

Return from subroutine

REVERT

0x0F

Revert execution

Arithmetic (0x10-0x1F)

ADD

0x10

Addition (3 registers)

SUB

0x11

Subtraction (3 registers)

MUL

0x12

Multiplication (3 registers)

DIV

0x13

Division (3 registers)

MOD

0x14

Modulo (3 registers)

ADDI

0x15

Add immediate (2 registers + u64 immediate)

Bitwise (0x20-0x2F)

AND

0x20

Bitwise AND (3 registers)

0x21

Bitwise OR (3 registers)

XOR

0x22

Bitwise XOR (3 registers)

NOT

0x23

Bitwise NOT (2 registers)

SHL

0x24

Shift left (3 registers)

SHR

0x25

Shift right (3 registers)

Comparison (0x30-0x3F)

0x30

Equal (3 registers)

0x31

Not equal (3 registers)

0x32

Less than (3 registers)

0x33

Greater than (3 registers)

0x34

Less than or equal (3 registers)

0x35

Greater than or equal (3 registers)

ISZERO

0x36

Is zero check (2 registers)

Memory (0x40-0x4F)

LOAD8

0x40

Load 8-bit from memory (2 registers)

LOAD64

0x41

Load 64-bit from memory (2 registers)

STORE8

0x42

Store 8-bit to memory (2 registers)

STORE64

0x43

Store 64-bit to memory (2 registers)

MSIZE

0x44

Get memory size (1 register)

MCOPY

0x45

Copy memory (3 registers)

Storage (0x50-0x5F)

SLOAD

0x50

Load from storage (2 registers)

SSTORE

0x51

Store to storage (2 registers)

Immediate (0x70-0x7F)

LOADI

0x70

Load immediate (1 register + u64 immediate)

MOV

0x71

Move between registers (2 registers)

Context (0x80-0x8F)

CALLER

0x80

Get caller address (1 register)

CALLVALUE

0x81

Get call value (1 register)

ADDRESS

0x82

Get current address (1 register)

BLOCKNUMBER

0x83

Get block number (1 register)

TIMESTAMP

0x84

Get timestamp (1 register)

GAS

0x85

Get remaining gas (1 register)

Debug (0xF0-0xFF)

LOG

0xF0

Log value (1 register)

Register Encoding

Registers are encoded in 4-bit nibbles (0-15 for R0-R15):

Single register: Stored in upper nibble, lower nibble is 0
Two registers: Packed as (reg1 << 4) | reg2
Three registers: First two packed in byte 1, third in upper nibble of byte 2

Examples

// Single register: JUMP R5
// Bytecode: [0x02, 0x50]
//           [JUMP, R5 << 4]

// Two registers: MOV R0, R1
// Bytecode: [0x71, 0x01]
//           [MOV, (R0 << 4) | R1]

// Three registers: ADD R2, R0, R1
// Bytecode: [0x10, 0x20, 0x10]
//           [ADD, (R2 << 4) | R0, R1 << 4]

Immediate Values

Immediate values are encoded as 64-bit little-endian integers:

// LOADI R0, 255
// Bytecode: [0x70, 0x00, 0xFF, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00]
//           [LOADI, R0, value as 8 bytes LE]

Error Types

CompileError

pub enum CompileError {
    UndefinedLabel(String),
    DuplicateLabel { label: String, first_addr: u64 },
    InvalidRegister(u8),
}

UndefinedLabel

Error

Label referenced but never defined

DuplicateLabel

Error

Label defined multiple times at different addresses

InvalidRegister

Error

Usage Examples

Basic Compilation

use minichain_assembler::parser::Parser;
use minichain_assembler::compiler::Compiler;

let source = r#"
LOADI R0, 10
HALT
"#;

let program = Parser::parse(source).unwrap();
let bytecode = Compiler::compile(&program).unwrap();

// LOADI R0, 10: [0x70, 0x00, 0x0A, 0x00, ...]
// HALT: [0x00]
assert_eq!(bytecode[0], 0x70);  // LOADI opcode
assert_eq!(bytecode[1], 0x00);  // R0
assert_eq!(bytecode[2], 10);    // immediate value
assert_eq!(bytecode[10], 0x00); // HALT

Label Resolution

use minichain_assembler::parser::Parser;
use minichain_assembler::compiler::Compiler;

let source = r#"
.entry main

main:
    LOADI R0, 10
    LOADI R5, loop_end
    JUMP R5

loop_end:
    HALT
"#;

let program = Parser::parse(source).unwrap();
let bytecode = Compiler::compile(&program).unwrap();

// Labels are resolved to addresses during compilation
assert!(!bytecode.is_empty());

Forward References

use minichain_assembler::parser::Parser;
use minichain_assembler::compiler::Compiler;

let source = r#"
LOADI R5, end      ; Forward reference
JUMP R5
ADD R0, R1, R2

end:
    HALT
"#;

let program = Parser::parse(source).unwrap();
let bytecode = Compiler::compile(&program).unwrap();

// Forward references work due to two-pass compilation
assert!(!bytecode.is_empty());

Constants

use minichain_assembler::parser::Parser;
use minichain_assembler::compiler::Compiler;

let source = r#"
.const MAX_VALUE 100

LOADI R0, MAX_VALUE
HALT
"#;

let program = Parser::parse(source).unwrap();
let bytecode = Compiler::compile(&program).unwrap();

// Constant is resolved to its value
assert_eq!(bytecode[0], 0x70);  // LOADI
assert_eq!(bytecode[1], 0x00);  // R0
assert_eq!(bytecode[2], 100);   // MAX_VALUE

Three-Register Instruction

use minichain_assembler::parser::Parser;
use minichain_assembler::compiler::Compiler;

let source = "ADD R2, R0, R1";
let program = Parser::parse(source).unwrap();
let bytecode = Compiler::compile(&program).unwrap();

// ADD R2, R0, R1
// Byte 0: 0x10 (ADD opcode)
// Byte 1: 0x20 (R2 in upper nibble, R0 in lower nibble)
// Byte 2: 0x10 (R1 in upper nibble)
assert_eq!(bytecode, vec![0x10, 0x20, 0x10]);

Error Handling: Undefined Label

use minichain_assembler::parser::Parser;
use minichain_assembler::compiler::{Compiler, CompileError};

let source = "LOADI R5, undefined_label";
let program = Parser::parse(source).unwrap();
let result = Compiler::compile(&program);

match result {
    Err(CompileError::UndefinedLabel(label)) => {
        assert_eq!(label, "undefined_label");
    }
    _ => panic!("Expected undefined label error"),
}

Error Handling: Duplicate Label

use minichain_assembler::parser::Parser;
use minichain_assembler::compiler::{Compiler, CompileError};

let source = r#"
main:
    HALT
main:
    HALT
"#;

let program = Parser::parse(source).unwrap();
let result = Compiler::compile(&program);

match result {
    Err(CompileError::DuplicateLabel { label, first_addr }) => {
        assert_eq!(label, "main");
        assert_eq!(first_addr, 0);
    }
    _ => panic!("Expected duplicate label error"),
}

Complete Example

use minichain_assembler::parser::Parser;
use minichain_assembler::compiler::Compiler;

let source = r#"
.entry main
.const ITERATIONS 10

main:
    LOADI R0, 0              ; counter = 0
    LOADI R1, ITERATIONS     ; max = 10
    LOADI R5, loop_body      ; load loop address

loop_body:
    ADDI R0, R0, 1           ; counter++
    LT R2, R0, R1            ; counter < max?
    LOADI R5, loop_body      ; reload loop address
    JUMPI R2, R5             ; jump if condition true
    HALT                     ; exit
"#;

let program = Parser::parse(source).unwrap();
let bytecode = Compiler::compile(&program).unwrap();

println!("Compiled {} bytes", bytecode.len());
println!("Bytecode: {:02X?}", bytecode);

Compilation Pipeline

First Pass: Symbol Collection

Iterate through all statements
Track current bytecode address
Record label addresses in symbol table
Store constant values
Validate no duplicate labels
Calculate instruction sizes for address tracking

Second Pass: Bytecode Emission

Iterate through all statements again
Skip labels and directives (non-code)
For each instruction:
- Emit opcode byte
- Encode register operands
- Resolve label references to addresses
- Emit immediate values in little-endian
Return complete bytecode vector

Bytecode Format

Each instruction follows a consistent format:

[Opcode] [Operands...]

1 byte: Opcode
0-2 bytes: Register operands (packed in nibbles)
0-8 bytes: Immediate value (64-bit little-endian)

Size Examples

HALT: 1 byte total
JUMP R5: 2 bytes total
MOV R0, R1: 2 bytes total
ADD R2, R0, R1: 3 bytes total
LOADI R0, 255: 10 bytes total (1 opcode + 1 register + 8 immediate)
ADDI R0, R1, 10: 10 bytes total (1 opcode + 1 packed registers + 8 immediate)

Symbol Table

The compiler maintains two tables:

Label Table

Maps label names to bytecode addresses:

HashMap<String, u64>

// Example:
// "main" -> 0
// "loop_body" -> 20
// "end" -> 45

Constant Table

Maps constant names to values:

HashMap<String, u64>

// Example:
// "MAX_VALUE" -> 100
// "ITERATIONS" -> 10

When resolving a LoadILabel instruction, the compiler checks both tables.

Lexer - Tokenize assembly source
Parser - Parse tokens into AST
Virtual Machine - Execute compiled bytecode

Core

Storage

Virtual Machine

Assembler

Consensus

Chain

​Overview

​Public API

​assemble

​assemble_with_ast

​Compiler API

​Compiler::new

​Compiler::compile

​Opcode Reference

​Control Flow (0x00-0x0F)

​Arithmetic (0x10-0x1F)

​Bitwise (0x20-0x2F)

​Comparison (0x30-0x3F)

​Memory (0x40-0x4F)

​Storage (0x50-0x5F)

​Immediate (0x70-0x7F)

​Context (0x80-0x8F)

​Debug (0xF0-0xFF)

​Register Encoding

​Examples

​Immediate Values

​Error Types

​CompileError

​Usage Examples

​Basic Compilation

​Label Resolution

​Forward References

​Constants

​Three-Register Instruction

​Error Handling: Undefined Label

​Error Handling: Duplicate Label

​Complete Example

​Compilation Pipeline

​First Pass: Symbol Collection

​Second Pass: Bytecode Emission

​Bytecode Format

​Size Examples

​Symbol Table

​Label Table

​Constant Table

​Related

Build docs developers (and LLMs) love

Overview

Public API

assemble

assemble_with_ast

Compiler API

Compiler::new

Compiler::compile

Opcode Reference

Control Flow (0x00-0x0F)

Arithmetic (0x10-0x1F)

Bitwise (0x20-0x2F)

Comparison (0x30-0x3F)

Memory (0x40-0x4F)

Storage (0x50-0x5F)

Immediate (0x70-0x7F)

Context (0x80-0x8F)

Debug (0xF0-0xFF)

Register Encoding

Examples

Immediate Values

Error Types

CompileError

Usage Examples

Basic Compilation

Label Resolution

Forward References

Constants

Three-Register Instruction

Error Handling: Undefined Label

Error Handling: Duplicate Label

Complete Example

Compilation Pipeline

First Pass: Symbol Collection

Second Pass: Bytecode Emission

Bytecode Format

Size Examples

Symbol Table

Label Table

Constant Table

Related