Parser - Minichain

Overview

The parser module converts tokenized assembly code into a structured Abstract Syntax Tree (AST). It validates syntax, tracks labels, and builds a representation ready for bytecode compilation.

AST Types

Program

The top-level AST structure representing a complete assembly program.

pub struct Program {
    pub statements: Vec<Statement>,
    pub entry_point: Option<String>,
}

statements

Vec<Statement>

required

All statements in the program (labels, instructions, directives)

entry_point

Option<String>

Optional entry point label specified by .entry directive

Statement

A single statement in the assembly program.

pub enum Statement {
    Label(String),
    Instruction(Instruction),
    Directive(Directive),
}

Label

String

A label definition (e.g., main:)

Instruction

An executable instruction

Directive

An assembler directive

Directive

Assembler directives that control compilation.

pub enum Directive {
    Entry(String),
    Const(String, u64),
}

Entry

String

Specifies the program entry point label

Const

(String, u64)

Defines a named constant with a value

Instruction

All supported assembly instructions organized by category.

Control Flow Instructions

Instruction::Halt
Instruction::Nop
Instruction::Jump { target: u8 }
Instruction::JumpI { cond: u8, target: u8 }
Instruction::Call { target: u8 }
Instruction::Ret
Instruction::Revert

Arithmetic Instructions

Instruction::Add { dst: u8, s1: u8, s2: u8 }
Instruction::Sub { dst: u8, s1: u8, s2: u8 }
Instruction::Mul { dst: u8, s1: u8, s2: u8 }
Instruction::Div { dst: u8, s1: u8, s2: u8 }
Instruction::Mod { dst: u8, s1: u8, s2: u8 }
Instruction::AddI { dst: u8, src: u8, imm: u64 }

Bitwise Instructions

Instruction::And { dst: u8, s1: u8, s2: u8 }
Instruction::Or { dst: u8, s1: u8, s2: u8 }
Instruction::Xor { dst: u8, s1: u8, s2: u8 }
Instruction::Not { dst: u8, src: u8 }
Instruction::Shl { dst: u8, s1: u8, s2: u8 }
Instruction::Shr { dst: u8, s1: u8, s2: u8 }

Comparison Instructions

Instruction::Eq { dst: u8, s1: u8, s2: u8 }
Instruction::Ne { dst: u8, s1: u8, s2: u8 }
Instruction::Lt { dst: u8, s1: u8, s2: u8 }
Instruction::Gt { dst: u8, s1: u8, s2: u8 }
Instruction::Le { dst: u8, s1: u8, s2: u8 }
Instruction::Ge { dst: u8, s1: u8, s2: u8 }
Instruction::IsZero { dst: u8, src: u8 }

Memory Instructions

Instruction::Load8 { dst: u8, addr: u8 }
Instruction::Load64 { dst: u8, addr: u8 }
Instruction::Store8 { addr: u8, src: u8 }
Instruction::Store64 { addr: u8, src: u8 }
Instruction::MSize { dst: u8 }
Instruction::MCopy { dst: u8, src: u8, len: u8 }

Storage Instructions

Instruction::SLoad { dst: u8, key: u8 }
Instruction::SStore { key: u8, value: u8 }

Immediate Instructions

Instruction::LoadI { dst: u8, value: u64 }
Instruction::LoadILabel { dst: u8, label: String }
Instruction::Mov { dst: u8, src: u8 }

Context Instructions

Instruction::Caller { dst: u8 }
Instruction::CallValue { dst: u8 }
Instruction::Address { dst: u8 }
Instruction::BlockNumber { dst: u8 }
Instruction::Timestamp { dst: u8 }
Instruction::Gas { dst: u8 }

Debug Instructions

Instruction::Log { src: u8 }

Parser API

Parser::new

pub fn new(source: &'source str) -> Self

Creates a new parser for the given assembly source.

source

&str

required

Assembly source code to parse

Parser

Parser<'source>

A new parser instance

Parser::parse

pub fn parse(source: &'source str) -> Result<Program>

Parses assembly source into a complete program AST.

source

&str

required

Assembly source code to parse

Result

Result<Program>

Successfully parsed program or parse error

Instruction::byte_size

pub fn byte_size(&self) -> usize

Calculates the size of an instruction in bytes when compiled.

size

usize

Instruction size in bytes:

No operands: 1 byte
Single register: 2 bytes
Two registers: 2 bytes
Three registers: 3 bytes
Register + immediate: 10 bytes (1 opcode + 1 register + 8 immediate)

Error Types

ParseError

pub enum ParseError {
    UnexpectedToken { expected: String, found: String, line: usize },
    UnexpectedEof,
    InvalidRegister { register: String, line: usize },
    DuplicateLabel { label: String, line: usize, first_line: usize },
}

UnexpectedToken

Error

Encountered an unexpected token during parsing. Includes expected token type, what was found, and line number.

UnexpectedEof

Error

Reached end of input unexpectedly while parsing

InvalidRegister

Error

Invalid register reference (not R0-R15)

DuplicateLabel

Error

Label defined multiple times in the program

Usage Examples

Basic Parsing

use minichain_assembler::parser::Parser;

let source = r#"
LOADI R0, 10
HALT
"#;

let program = Parser::parse(source).unwrap();
assert_eq!(program.statements.len(), 2);

Parsing with Labels

use minichain_assembler::parser::{Parser, Statement};

let source = r#"
main:
    LOADI R0, 10
    HALT
"#;

let program = Parser::parse(source).unwrap();

// First statement is the label
match &program.statements[0] {
    Statement::Label(name) => assert_eq!(name, "main"),
    _ => panic!("Expected label"),
}

Entry Point Directive

use minichain_assembler::parser::Parser;

let source = r#"
.entry main

main:
    LOADI R0, 42
    HALT
"#;

let program = Parser::parse(source).unwrap();
assert_eq!(program.entry_point, Some("main".to_string()));

Constant Directive

use minichain_assembler::parser::{Parser, Statement, Directive};

let source = r#"
.const MAX_VALUE 100
LOADI R0, MAX_VALUE
"#;

let program = Parser::parse(source).unwrap();

match &program.statements[0] {
    Statement::Directive(Directive::Const(name, value)) => {
        assert_eq!(name, "MAX_VALUE");
        assert_eq!(*value, 100);
    }
    _ => panic!("Expected const directive"),
}

Three-Register Instructions

use minichain_assembler::parser::{Parser, Statement, Instruction};

let source = "ADD R2, R0, R1";
let program = Parser::parse(source).unwrap();

match &program.statements[0] {
    Statement::Instruction(Instruction::Add { dst, s1, s2 }) => {
        assert_eq!(*dst, 2);
        assert_eq!(*s1, 0);
        assert_eq!(*s2, 1);
    }
    _ => panic!("Expected ADD instruction"),
}

Label References

use minichain_assembler::parser::{Parser, Statement, Instruction};

let source = "LOADI R5, loop_start";
let program = Parser::parse(source).unwrap();

match &program.statements[0] {
    Statement::Instruction(Instruction::LoadILabel { dst, label }) => {
        assert_eq!(*dst, 5);
        assert_eq!(label, "loop_start");
    }
    _ => panic!("Expected LoadILabel"),
}

Instruction Byte Sizes

use minichain_assembler::parser::Instruction;

// No operands: 1 byte
assert_eq!(Instruction::Halt.byte_size(), 1);

// Single register: 2 bytes
assert_eq!(Instruction::Jump { target: 0 }.byte_size(), 2);

// Two registers: 2 bytes
assert_eq!(Instruction::Mov { dst: 0, src: 1 }.byte_size(), 2);

// Three registers: 3 bytes
assert_eq!(
    Instruction::Add { dst: 0, s1: 1, s2: 2 }.byte_size(),
    3
);

// Register + immediate: 10 bytes
assert_eq!(
    Instruction::LoadI { dst: 0, value: 100 }.byte_size(),
    10
);

Error Handling

use minichain_assembler::parser::{Parser, ParseError};

let source = "ADD R0";
let result = Parser::parse(source);

match result {
    Err(ParseError::UnexpectedToken { expected, found, line }) => {
        assert_eq!(line, 1);
        println!("Expected: {}, Found: {}", expected, found);
    }
    _ => panic!("Expected parse error"),
}

Complete Program Example

use minichain_assembler::parser::Parser;

let source = r#"
.entry main
.const ITERATIONS 10

main:
    LOADI R0, 0              ; counter
    LOADI R1, ITERATIONS     ; max iterations
    LOADI R5, loop_body

loop_body:
    ADDI R0, R0, 1           ; increment counter
    LT R2, R0, R1            ; check if counter < max
    LOADI R5, loop_body
    JUMPI R2, R5             ; continue if true
    HALT
"#;

let program = Parser::parse(source).unwrap();
assert_eq!(program.entry_point, Some("main".to_string()));

Parsing Workflow

Tokenization: Source code is tokenized by the lexer
Statement Parsing: Each statement is parsed into:
- Labels (identifier followed by colon)
- Instructions (opcodes with operands)
- Directives (starting with .)
Operand Validation: Register numbers, immediate values, and labels are validated
Entry Point Tracking: The .entry directive sets the program entry point
AST Construction: All statements are collected into a Program structure

Instruction Syntax

No Operands

HALT
NOP
RET
REVERT

Single Register

JUMP R5
CALL R3
LOG R0

Two Registers

MOV R0, R1
NOT R2, R3
LOAD8 R0, R5

Three Registers

ADD R0, R1, R2
MUL R3, R4, R5
EQ R6, R7, R8

Register + Immediate

LOADI R0, 42
LOADI R1, 0xFF
ADDI R2, R3, 10

Register + Label

LOADI R5, loop_start
LOADI R6, end_address

Lexer - Tokenize assembly source
Compiler - Compile AST to bytecode

Core

Storage

Virtual Machine

Assembler

Consensus

Chain

​Overview

​AST Types

​Program

​Statement

​Directive

​Instruction

​Control Flow Instructions

​Arithmetic Instructions

​Bitwise Instructions

​Comparison Instructions

​Memory Instructions

​Storage Instructions

​Immediate Instructions

​Context Instructions

​Debug Instructions

​Parser API

​Parser::new

​Parser::parse

​Instruction::byte_size

​Error Types

​ParseError

​Usage Examples

​Basic Parsing

​Parsing with Labels

​Entry Point Directive

​Constant Directive

​Three-Register Instructions

​Label References

​Instruction Byte Sizes

​Error Handling

​Complete Program Example

​Parsing Workflow

​Instruction Syntax

​No Operands

​Single Register

​Two Registers

​Three Registers

​Register + Immediate

​Register + Label

​Related

Build docs developers (and LLMs) love

Overview

AST Types

Program

Statement

Directive

Instruction

Control Flow Instructions

Arithmetic Instructions

Bitwise Instructions

Comparison Instructions

Memory Instructions

Storage Instructions

Immediate Instructions

Context Instructions

Debug Instructions

Parser API

Parser::new

Parser::parse

Instruction::byte_size

Error Types

ParseError

Usage Examples

Basic Parsing

Parsing with Labels

Entry Point Directive

Constant Directive

Three-Register Instructions

Label References

Instruction Byte Sizes

Error Handling

Complete Program Example

Parsing Workflow

Instruction Syntax

No Operands

Single Register

Two Registers

Three Registers

Register + Immediate

Register + Label

Related