Overview
The parser module converts tokenized assembly code into a structured Abstract Syntax Tree (AST). It validates syntax, tracks labels, and builds a representation ready for bytecode compilation.
AST Types
Program
The top-level AST structure representing a complete assembly program.
pub struct Program {
pub statements: Vec<Statement>,
pub entry_point: Option<String>,
}
All statements in the program (labels, instructions, directives)
Optional entry point label specified by .entry directive
Statement
A single statement in the assembly program.
pub enum Statement {
Label(String),
Instruction(Instruction),
Directive(Directive),
}
A label definition (e.g., main:)
An executable instruction
Directive
Assembler directives that control compilation.
pub enum Directive {
Entry(String),
Const(String, u64),
}
Specifies the program entry point label
Defines a named constant with a value
Instruction
All supported assembly instructions organized by category.
Control Flow Instructions
Instruction::Halt
Instruction::Nop
Instruction::Jump { target: u8 }
Instruction::JumpI { cond: u8, target: u8 }
Instruction::Call { target: u8 }
Instruction::Ret
Instruction::Revert
Arithmetic Instructions
Instruction::Add { dst: u8, s1: u8, s2: u8 }
Instruction::Sub { dst: u8, s1: u8, s2: u8 }
Instruction::Mul { dst: u8, s1: u8, s2: u8 }
Instruction::Div { dst: u8, s1: u8, s2: u8 }
Instruction::Mod { dst: u8, s1: u8, s2: u8 }
Instruction::AddI { dst: u8, src: u8, imm: u64 }
Bitwise Instructions
Instruction::And { dst: u8, s1: u8, s2: u8 }
Instruction::Or { dst: u8, s1: u8, s2: u8 }
Instruction::Xor { dst: u8, s1: u8, s2: u8 }
Instruction::Not { dst: u8, src: u8 }
Instruction::Shl { dst: u8, s1: u8, s2: u8 }
Instruction::Shr { dst: u8, s1: u8, s2: u8 }
Comparison Instructions
Instruction::Eq { dst: u8, s1: u8, s2: u8 }
Instruction::Ne { dst: u8, s1: u8, s2: u8 }
Instruction::Lt { dst: u8, s1: u8, s2: u8 }
Instruction::Gt { dst: u8, s1: u8, s2: u8 }
Instruction::Le { dst: u8, s1: u8, s2: u8 }
Instruction::Ge { dst: u8, s1: u8, s2: u8 }
Instruction::IsZero { dst: u8, src: u8 }
Memory Instructions
Instruction::Load8 { dst: u8, addr: u8 }
Instruction::Load64 { dst: u8, addr: u8 }
Instruction::Store8 { addr: u8, src: u8 }
Instruction::Store64 { addr: u8, src: u8 }
Instruction::MSize { dst: u8 }
Instruction::MCopy { dst: u8, src: u8, len: u8 }
Storage Instructions
Instruction::SLoad { dst: u8, key: u8 }
Instruction::SStore { key: u8, value: u8 }
Instruction::LoadI { dst: u8, value: u64 }
Instruction::LoadILabel { dst: u8, label: String }
Instruction::Mov { dst: u8, src: u8 }
Context Instructions
Instruction::Caller { dst: u8 }
Instruction::CallValue { dst: u8 }
Instruction::Address { dst: u8 }
Instruction::BlockNumber { dst: u8 }
Instruction::Timestamp { dst: u8 }
Instruction::Gas { dst: u8 }
Debug Instructions
Instruction::Log { src: u8 }
Parser API
Parser::new
pub fn new(source: &'source str) -> Self
Creates a new parser for the given assembly source.
Assembly source code to parse
Parser::parse
pub fn parse(source: &'source str) -> Result<Program>
Parses assembly source into a complete program AST.
Assembly source code to parse
Successfully parsed program or parse error
Instruction::byte_size
pub fn byte_size(&self) -> usize
Calculates the size of an instruction in bytes when compiled.
Instruction size in bytes:
- No operands: 1 byte
- Single register: 2 bytes
- Two registers: 2 bytes
- Three registers: 3 bytes
- Register + immediate: 10 bytes (1 opcode + 1 register + 8 immediate)
Error Types
ParseError
pub enum ParseError {
UnexpectedToken { expected: String, found: String, line: usize },
UnexpectedEof,
InvalidRegister { register: String, line: usize },
DuplicateLabel { label: String, line: usize, first_line: usize },
}
Encountered an unexpected token during parsing. Includes expected token type, what was found, and line number.
Reached end of input unexpectedly while parsing
Invalid register reference (not R0-R15)
Label defined multiple times in the program
Usage Examples
Basic Parsing
use minichain_assembler::parser::Parser;
let source = r#"
LOADI R0, 10
HALT
"#;
let program = Parser::parse(source).unwrap();
assert_eq!(program.statements.len(), 2);
Parsing with Labels
use minichain_assembler::parser::{Parser, Statement};
let source = r#"
main:
LOADI R0, 10
HALT
"#;
let program = Parser::parse(source).unwrap();
// First statement is the label
match &program.statements[0] {
Statement::Label(name) => assert_eq!(name, "main"),
_ => panic!("Expected label"),
}
Entry Point Directive
use minichain_assembler::parser::Parser;
let source = r#"
.entry main
main:
LOADI R0, 42
HALT
"#;
let program = Parser::parse(source).unwrap();
assert_eq!(program.entry_point, Some("main".to_string()));
Constant Directive
use minichain_assembler::parser::{Parser, Statement, Directive};
let source = r#"
.const MAX_VALUE 100
LOADI R0, MAX_VALUE
"#;
let program = Parser::parse(source).unwrap();
match &program.statements[0] {
Statement::Directive(Directive::Const(name, value)) => {
assert_eq!(name, "MAX_VALUE");
assert_eq!(*value, 100);
}
_ => panic!("Expected const directive"),
}
Three-Register Instructions
use minichain_assembler::parser::{Parser, Statement, Instruction};
let source = "ADD R2, R0, R1";
let program = Parser::parse(source).unwrap();
match &program.statements[0] {
Statement::Instruction(Instruction::Add { dst, s1, s2 }) => {
assert_eq!(*dst, 2);
assert_eq!(*s1, 0);
assert_eq!(*s2, 1);
}
_ => panic!("Expected ADD instruction"),
}
Label References
use minichain_assembler::parser::{Parser, Statement, Instruction};
let source = "LOADI R5, loop_start";
let program = Parser::parse(source).unwrap();
match &program.statements[0] {
Statement::Instruction(Instruction::LoadILabel { dst, label }) => {
assert_eq!(*dst, 5);
assert_eq!(label, "loop_start");
}
_ => panic!("Expected LoadILabel"),
}
Instruction Byte Sizes
use minichain_assembler::parser::Instruction;
// No operands: 1 byte
assert_eq!(Instruction::Halt.byte_size(), 1);
// Single register: 2 bytes
assert_eq!(Instruction::Jump { target: 0 }.byte_size(), 2);
// Two registers: 2 bytes
assert_eq!(Instruction::Mov { dst: 0, src: 1 }.byte_size(), 2);
// Three registers: 3 bytes
assert_eq!(
Instruction::Add { dst: 0, s1: 1, s2: 2 }.byte_size(),
3
);
// Register + immediate: 10 bytes
assert_eq!(
Instruction::LoadI { dst: 0, value: 100 }.byte_size(),
10
);
Error Handling
use minichain_assembler::parser::{Parser, ParseError};
let source = "ADD R0";
let result = Parser::parse(source);
match result {
Err(ParseError::UnexpectedToken { expected, found, line }) => {
assert_eq!(line, 1);
println!("Expected: {}, Found: {}", expected, found);
}
_ => panic!("Expected parse error"),
}
Complete Program Example
use minichain_assembler::parser::Parser;
let source = r#"
.entry main
.const ITERATIONS 10
main:
LOADI R0, 0 ; counter
LOADI R1, ITERATIONS ; max iterations
LOADI R5, loop_body
loop_body:
ADDI R0, R0, 1 ; increment counter
LT R2, R0, R1 ; check if counter < max
LOADI R5, loop_body
JUMPI R2, R5 ; continue if true
HALT
"#;
let program = Parser::parse(source).unwrap();
assert_eq!(program.entry_point, Some("main".to_string()));
Parsing Workflow
- Tokenization: Source code is tokenized by the lexer
- Statement Parsing: Each statement is parsed into:
- Labels (identifier followed by colon)
- Instructions (opcodes with operands)
- Directives (starting with
.)
- Operand Validation: Register numbers, immediate values, and labels are validated
- Entry Point Tracking: The
.entry directive sets the program entry point
- AST Construction: All statements are collected into a
Program structure
Instruction Syntax
No Operands
Single Register
Two Registers
MOV R0, R1
NOT R2, R3
LOAD8 R0, R5
Three Registers
ADD R0, R1, R2
MUL R3, R4, R5
EQ R6, R7, R8
LOADI R0, 42
LOADI R1, 0xFF
ADDI R2, R3, 10
Register + Label
LOADI R5, loop_start
LOADI R6, end_address
- Lexer - Tokenize assembly source
- Compiler - Compile AST to bytecode