Lexer

Overview

The lexer module provides fast tokenization for assembly language source code using the logos crate. It converts assembly source text into a stream of tokens for parsing.

Token Types

The lexer recognizes the following token categories:

Control Flow Instructions

HALT

Token

Halts program execution

NOP

Token

No operation (does nothing)

JUMP

Token

Unconditional jump instruction

JUMPI

Token

Conditional jump instruction

CALL

Token

Call subroutine

RET

Token

Return from subroutine

REVERT

Token

Revert execution

Arithmetic Instructions

ADD

Token

Addition operation

SUB

Token

Subtraction operation

MUL

Token

Multiplication operation

DIV

Token

Division operation

MOD

Token

Modulo operation

ADDI

Token

Add immediate value

Bitwise Instructions

AND

Token

Bitwise AND

Token

Bitwise OR

XOR

Token

Bitwise XOR

NOT

Token

Bitwise NOT

SHL

Token

Shift left

SHR

Token

Shift right

Comparison Instructions

Token

Equal comparison

Token

Not equal comparison

Token

Less than comparison

Token

Greater than comparison

Token

Less than or equal comparison

Token

Greater than or equal comparison

ISZERO

Token

Check if value is zero

Memory Instructions

LOAD8

Token

Load 8-bit value from memory

LOAD64

Token

Load 64-bit value from memory

STORE8

Token

Store 8-bit value to memory

STORE64

Token

Store 64-bit value to memory

MSIZE

Token

Get memory size

MCOPY

Token

Copy memory region

Storage Instructions

SLOAD

Token

Load from persistent storage

SSTORE

Token

Store to persistent storage

Immediate Instructions

LOADI

Token

Load immediate value into register

MOV

Token

Move value between registers

Context Instructions

CALLER

Token

Get caller address

CALLVALUE

Token

Get call value

ADDRESS

Token

Get current contract address

BLOCKNUMBER

Token

Get current block number

TIMESTAMP

Token

Get block timestamp

GAS

Token

Get remaining gas

Debug Instructions

LOG

Token

Log value for debugging

Operands and Symbols

Token

Number(u64)

Token

Decimal number literal

HexNumber(u64)

Token

Hexadecimal number literal (0x prefix)

Identifier(String)

Token

Label or constant name

Directive(String)

Token

Assembler directive (starts with .)

Comma

Token

Comma separator

Colon

Token

Colon for label definitions

Lexer API

Lexer::new

pub fn new(source: &'source str) -> Self

Creates a new lexer for the given source code.

source

&str

required

Assembly source code to tokenize

Lexer

Lexer<'source>

A new lexer instance ready to tokenize the source

Lexer::span

pub fn span(&self) -> std::ops::Range<usize>

Returns the byte range span of the current token in the source.

Range

Range<usize>

Byte range of the current token

Lexer::slice

pub fn slice(&self) -> &'source str

Returns the string slice of the current token.

slice

&str

String content of the current token

Iterator Implementation

The Lexer implements Iterator with items of type (Token, usize), where the tuple contains the token and its line number.

fn next(&mut self) -> Option<(Token, usize)>

Option<(Token, usize)>

Returns the next token and its line number, or None at end of input

Features

Case Insensitive: All instruction mnemonics are case-insensitive (ADD, add, Add all work)
Line Tracking: Each token is tagged with its line number for error reporting
Comment Support: Line comments starting with ; are automatically skipped
Whitespace Handling: Spaces, tabs, and newlines are automatically skipped
Error Recovery: Invalid characters are converted to error tokens

Usage Examples

Basic Tokenization

use minichain_assembler::lexer::Lexer;

let source = "LOADI R0, 10";
let tokens: Vec<_> = Lexer::new(source)
    .map(|(token, _line)| token)
    .collect();

// tokens: [LoadI, Register(0), Comma, Number(10)]

With Line Numbers

use minichain_assembler::lexer::Lexer;

let source = r#"
LOADI R0, 10
ADD R1, R0, R0
HALT
"#;

for (token, line) in Lexer::new(source) {
    println!("Line {}: {:?}", line, token);
}

Handling Labels and Comments

use minichain_assembler::lexer::Lexer;

let source = r#"
main:           ; Entry point
    LOADI R0, 10
    HALT
"#;

let tokens: Vec<_> = Lexer::new(source)
    .map(|(token, _)| token)
    .collect();

// Comments are automatically stripped
// tokens: [Identifier("main"), Colon, LoadI, Register(0), Comma, Number(10), Halt]

Hexadecimal Numbers

use minichain_assembler::lexer::Lexer;

let source = "LOADI R0, 0xFF";
let tokens: Vec<_> = Lexer::new(source)
    .map(|(token, _)| token)
    .collect();

// tokens: [LoadI, Register(0), Comma, HexNumber(255)]

Register Range

use minichain_assembler::lexer::Lexer;

let source = "R0 R15 R16";
let tokens: Vec<_> = Lexer::new(source)
    .map(|(token, _)| token)
    .collect();

// R0-R15 are valid registers
// R16 is out of range and becomes an Identifier
// tokens: [Register(0), Register(15), Identifier("R16")]

Directives

use minichain_assembler::lexer::Lexer;

let source = ".entry main";
let tokens: Vec<_> = Lexer::new(source)
    .map(|(token, _)| token)
    .collect();

// tokens: [Directive("entry"), Identifier("main")]

Token Patterns

Instructions

Pattern

Case-insensitive keywords (HALT, NOP, ADD, etc.)

Registers

Pattern

[Rr][0-9] or [Rr]1[0-5] (R0-R15)

Decimal Numbers

Pattern

[0-9]+

Hex Numbers

Pattern

0x[0-9a-fA-F]+

Identifiers

Pattern

[a-zA-Z_][a-zA-Z0-9_]*

Directives

Pattern

\.[a-z]+

Comments

Pattern

;[^\n]* (skipped automatically)

Parser - Parse tokens into AST
Compiler - Compile AST to bytecode

Core

Storage

Virtual Machine

Assembler

Consensus

Chain

Overview

Token Types

Control Flow Instructions

Arithmetic Instructions

Bitwise Instructions

Comparison Instructions

Memory Instructions

Storage Instructions

Immediate Instructions

Context Instructions

Debug Instructions

Operands and Symbols

Lexer API

Lexer::new

Lexer::span

Lexer::slice

Iterator Implementation

Features

Usage Examples

Basic Tokenization

With Line Numbers

Handling Labels and Comments

Hexadecimal Numbers

Register Range

Directives

Token Patterns

Build docs developers (and LLMs) love

Core

Storage

Virtual Machine

Assembler

Consensus

Chain

​Overview

​Token Types

​Control Flow Instructions

​Arithmetic Instructions

​Bitwise Instructions

​Comparison Instructions

​Memory Instructions

​Storage Instructions

​Immediate Instructions

​Context Instructions

​Debug Instructions

​Operands and Symbols

​Lexer API

​Lexer::new

​Lexer::span

​Lexer::slice

​Iterator Implementation

​Features

​Usage Examples

​Basic Tokenization

​With Line Numbers

​Handling Labels and Comments

​Hexadecimal Numbers

​Register Range

​Directives

​Token Patterns

​Related

Build docs developers (and LLMs) love

Overview

Token Types

Control Flow Instructions

Arithmetic Instructions

Bitwise Instructions

Comparison Instructions

Memory Instructions

Storage Instructions

Immediate Instructions

Context Instructions

Debug Instructions

Operands and Symbols

Lexer API

Lexer::new

Lexer::span

Lexer::slice

Iterator Implementation

Features

Usage Examples

Basic Tokenization

With Line Numbers

Handling Labels and Comments

Hexadecimal Numbers

Register Range

Directives

Token Patterns

Related