Skip to main content
The Lexer struct tokenizes EVM bytecode into individual bytes for execution. It handles bytecode parsing, validation, and character-by-character traversal.

Structure

pub struct Lexer<'a> {
    pub bytecode: &'a str,
    pub position: u64,
    pub read_position: u64,
    pub ch: char,
}

Fields

  • bytecode: The hex string of bytecode (without “0x” prefix)
  • position: Current character position being processed
  • read_position: Next character position to be read
  • ch: Current character being examined

Methods

new()

Creates a new lexer instance from a bytecode string.
bytecode
&str
required
The bytecode string (with or without “0x” prefix)
Result
Result<Lexer, LexerError>
Returns a new lexer instance or LexerError::UnableToCreateLexer if invalid
let bytecode = "0x6080";
let lexer = Lexer::new(bytecode)?;
assert_eq!(lexer.bytecode, "6080");

read_char()

Reads the next character from the bytecode and advances the position.
let bytecode = "0x608060";
let mut lexer = Lexer::new(bytecode)?;
lexer.read_char();
assert_eq!(lexer.ch, '6');

next_byte()

Reads and returns the next byte (two hex characters) from the bytecode.
Result
Result<String, Box<dyn Error>>
Returns the next byte as a hex string or an error if invalid
let bytecode = "0x608011fa";
let mut lexer = Lexer::new(bytecode)?;
lexer.read_char();

assert_eq!(lexer.next_byte()?, "60");
assert_eq!(lexer.next_byte()?, "80");
assert_eq!(lexer.next_byte()?, "11");
assert_eq!(lexer.next_byte()?, "fa");

Usage Examples

Parsing Bytecode with Prefix

The lexer automatically strips the “0x” prefix:
let bytecode = "0x12ffcb";
let lexer = Lexer::new(bytecode)?;
assert_eq!(lexer.bytecode, "12ffcb");

Iterating Through Bytes

let bytecode = "0x608011facddb";
let expected_bytes = vec!["60", "80", "11", "fa", "cd", "db"];

let mut lexer = Lexer::new(bytecode)?;
lexer.read_char();

let mut index = 0;
while lexer.ch != '\0' {
    let byte = lexer.next_byte()?;
    assert_eq!(byte, expected_bytes[index]);
    index += 1;
}

Character-by-Character Reading

let bytecode = "0x11ffcdaa";
let expected_chars = vec!['1', '1', 'f', 'f', 'c', 'd', 'a', 'a'];

let mut lexer = Lexer::new(bytecode)?;
lexer.read_char();

let mut index = 0;
while lexer.ch != '\0' {
    assert_eq!(lexer.ch, expected_chars[index]);
    lexer.read_char();
    index += 1;
}

Error Handling

The lexer validates bytecode and returns specific errors:

Whitespace in Bytecode

let bytecode = "0x60 80";
let mut lexer = Lexer::new(bytecode)?;
lexer.read_char();
lexer.next_byte()?; // First byte OK

let result = lexer.next_byte();
assert!(matches!(result, Err(e) 
    if e.downcast_ref::<LexerError>() == Some(&LexerError::HasWhitespace)));

Empty Character

let bytecode = "0x1"; // Only one nibble
let mut lexer = Lexer::new(bytecode)?;

let result = lexer.next_byte();
assert!(matches!(result, Err(e) 
    if e.downcast_ref::<LexerError>() == Some(&LexerError::EmptyChar)));

Invalid Hex Characters

let bytecode = "0xzz"; // Non-hex characters
let mut lexer = Lexer::new(bytecode)?;
lexer.read_char();

let result = lexer.next_byte();
assert!(matches!(result, Err(e) 
    if e.downcast_ref::<LexerError>() == Some(&LexerError::InvalidNibble)));

Lexer Errors

The lexer can return the following errors:
  • UnableToCreateLexer: Failed to strip “0x” prefix or invalid input
  • HasWhitespace: Bytecode contains whitespace characters
  • EmptyChar: Encountered null character (bytecode too short)
  • InvalidNibble: Non-hexadecimal character in bytecode

Integration with VM

The lexer is used internally by the VM to parse bytecode:
let vm = Vm::new("0x600160020101", false)?;
// VM automatically creates a lexer and parses the bytecode
During execution, the VM calls:
  1. lexer.read_char() to initialize
  2. lexer.next_byte() repeatedly to fetch opcodes and operands
  3. Handles multi-byte instructions like PUSH

Bytecode Format

The lexer expects:
  • Valid hex characters: 0-9, a-f, A-F
  • Even length: Each byte is two hex characters
  • Optional prefix: “0x” is automatically removed
  • No whitespace: Spaces, tabs, newlines are invalid
// Valid bytecode examples
Lexer::new("0x6080")       // With prefix
Lexer::new("6080")         // Without prefix
Lexer::new("0x600160020101") // Multiple instructions

// Invalid bytecode examples
Lexer::new("0x60 80")      // Contains whitespace
Lexer::new("0x608")        // Odd length
Lexer::new("0xGG")         // Non-hex characters

Build docs developers (and LLMs) love