Lexer

The Lexer struct tokenizes EVM bytecode into individual bytes for execution. It handles bytecode parsing, validation, and character-by-character traversal.

Structure

pub struct Lexer<'a> {
    pub bytecode: &'a str,
    pub position: u64,
    pub read_position: u64,
    pub ch: char,
}

Fields

bytecode: The hex string of bytecode (without “0x” prefix)
position: Current character position being processed
read_position: Next character position to be read
ch: Current character being examined

Methods

new()

Creates a new lexer instance from a bytecode string.

bytecode

&str

required

The bytecode string (with or without “0x” prefix)

Result

Result<Lexer, LexerError>

Returns a new lexer instance or LexerError::UnableToCreateLexer if invalid

let bytecode = "0x6080";
let lexer = Lexer::new(bytecode)?;
assert_eq!(lexer.bytecode, "6080");

read_char()

Reads the next character from the bytecode and advances the position.

let bytecode = "0x608060";
let mut lexer = Lexer::new(bytecode)?;
lexer.read_char();
assert_eq!(lexer.ch, '6');

next_byte()

Reads and returns the next byte (two hex characters) from the bytecode.

Result

Result<String, Box<dyn Error>>

Returns the next byte as a hex string or an error if invalid

let bytecode = "0x608011fa";
let mut lexer = Lexer::new(bytecode)?;
lexer.read_char();

assert_eq!(lexer.next_byte()?, "60");
assert_eq!(lexer.next_byte()?, "80");
assert_eq!(lexer.next_byte()?, "11");
assert_eq!(lexer.next_byte()?, "fa");

Usage Examples

Parsing Bytecode with Prefix

The lexer automatically strips the “0x” prefix:

let bytecode = "0x12ffcb";
let lexer = Lexer::new(bytecode)?;
assert_eq!(lexer.bytecode, "12ffcb");

Iterating Through Bytes

let bytecode = "0x608011facddb";
let expected_bytes = vec!["60", "80", "11", "fa", "cd", "db"];

let mut lexer = Lexer::new(bytecode)?;
lexer.read_char();

let mut index = 0;
while lexer.ch != '\0' {
    let byte = lexer.next_byte()?;
    assert_eq!(byte, expected_bytes[index]);
    index += 1;
}

Character-by-Character Reading

let bytecode = "0x11ffcdaa";
let expected_chars = vec!['1', '1', 'f', 'f', 'c', 'd', 'a', 'a'];

let mut lexer = Lexer::new(bytecode)?;
lexer.read_char();

let mut index = 0;
while lexer.ch != '\0' {
    assert_eq!(lexer.ch, expected_chars[index]);
    lexer.read_char();
    index += 1;
}

Error Handling

The lexer validates bytecode and returns specific errors:

Whitespace in Bytecode

let bytecode = "0x60 80";
let mut lexer = Lexer::new(bytecode)?;
lexer.read_char();
lexer.next_byte()?; // First byte OK

let result = lexer.next_byte();
assert!(matches!(result, Err(e) 
    if e.downcast_ref::<LexerError>() == Some(&LexerError::HasWhitespace)));

Empty Character

let bytecode = "0x1"; // Only one nibble
let mut lexer = Lexer::new(bytecode)?;

let result = lexer.next_byte();
assert!(matches!(result, Err(e) 
    if e.downcast_ref::<LexerError>() == Some(&LexerError::EmptyChar)));

Invalid Hex Characters

let bytecode = "0xzz"; // Non-hex characters
let mut lexer = Lexer::new(bytecode)?;
lexer.read_char();

let result = lexer.next_byte();
assert!(matches!(result, Err(e) 
    if e.downcast_ref::<LexerError>() == Some(&LexerError::InvalidNibble)));

Lexer Errors

The lexer can return the following errors:

UnableToCreateLexer: Failed to strip “0x” prefix or invalid input
HasWhitespace: Bytecode contains whitespace characters
EmptyChar: Encountered null character (bytecode too short)
InvalidNibble: Non-hexadecimal character in bytecode

Integration with VM

The lexer is used internally by the VM to parse bytecode:

let vm = Vm::new("0x600160020101", false)?;
// VM automatically creates a lexer and parses the bytecode

During execution, the VM calls:

lexer.read_char() to initialize
lexer.next_byte() repeatedly to fetch opcodes and operands
Handles multi-byte instructions like PUSH

Bytecode Format

The lexer expects:

Valid hex characters: 0-9, a-f, A-F
Even length: Each byte is two hex characters
Optional prefix: “0x” is automatically removed
No whitespace: Spaces, tabs, newlines are invalid

// Valid bytecode examples
Lexer::new("0x6080")       // With prefix
Lexer::new("6080")         // Without prefix
Lexer::new("0x600160020101") // Multiple instructions

// Invalid bytecode examples
Lexer::new("0x60 80")      // Contains whitespace
Lexer::new("0x608")        // Odd length
Lexer::new("0xGG")         // Non-hex characters

Library Usage

Core Types

Examples

Structure

Fields

Methods

new()

read_char()

next_byte()

Usage Examples

Parsing Bytecode with Prefix

Iterating Through Bytes

Character-by-Character Reading

Error Handling

Whitespace in Bytecode

Empty Character

Invalid Hex Characters

Lexer Errors

Integration with VM

Bytecode Format

Build docs developers (and LLMs) love

Library Usage

Core Types

Examples

​Structure

​Fields

​Methods

​new()

​read_char()

​next_byte()

​Usage Examples

​Parsing Bytecode with Prefix

​Iterating Through Bytes

​Character-by-Character Reading

​Error Handling

​Whitespace in Bytecode

​Empty Character

​Invalid Hex Characters

​Lexer Errors

​Integration with VM

​Bytecode Format

Build docs developers (and LLMs) love

Structure

Fields

Methods

new()

read_char()

next_byte()

Usage Examples

Parsing Bytecode with Prefix

Iterating Through Bytes

Character-by-Character Reading

Error Handling

Whitespace in Bytecode

Empty Character

Invalid Hex Characters

Lexer Errors

Integration with VM

Bytecode Format