Skip to main content
The tokenizer module provides a complete API for tokenizing Lua source code into individual tokens. Unlike parsing, tokenization operates at a lower level and can be used independently to analyze code structure, whitespace, and comments.

Key Components

  • Token - Individual tokens with position information
  • TokenReference - Tokens with leading/trailing trivia
  • Position - Exact position tracking in source code
  • Lexer - The tokenization engine

Basic Usage

The tokenizer can be used independently from the parser:
use full_moon::tokenizer::Lexer;
use full_moon::LuaVersion;

let source = "local x = 5";
let lexer = Lexer::new(source, LuaVersion::lua51());
let tokens = lexer.collect().unwrap();

Lexer

The main entry point for tokenization.
pub struct Lexer {
    pub lua_version: LuaVersion,
    // internal fields omitted
}

Methods

new

Creates a new Lexer from the given source string and Lua version(s).
pub fn new(source: &str, lua_version: LuaVersion) -> Self

current

Returns the current token.
pub fn current(&self) -> Option<&LexerResult<TokenReference>>

peek

Returns the next token without consuming the current one.
pub fn peek(&self) -> Option<&LexerResult<TokenReference>>

consume

Consumes the current token and returns it.
pub fn consume(&mut self) -> Option<LexerResult<TokenReference>>

collect

Returns a vector of all tokens left in the source string.
pub fn collect(self) -> LexerResult<Vec<Token>>

process_next

Processes and returns the next token in the source string, ignoring trivia.
pub fn process_next(&mut self) -> Option<LexerResult<Token>>

LexerResult

The result of a lexer operation.
pub enum LexerResult<T> {
    /// The lexer operation was successful.
    Ok(T),
    /// The lexer operation was unsuccessful, and could not be recovered.
    Fatal(Vec<TokenizerError>),
    /// The lexer operation was unsuccessful, but some result can be extracted.
    Recovered(T, Vec<TokenizerError>),
}

Methods

unwrap

Unwraps the result, panicking if it is not LexerResult::Ok.
pub fn unwrap(self) -> T

unwrap_errors

Unwraps the errors, panicking if it is LexerResult::Ok.
pub fn unwrap_errors(self) -> Vec<TokenizerError>

errors

Returns the errors, if there was any.
pub fn errors(self) -> Vec<TokenizerError>

Error Handling

TokenizerErrorType

The possible errors that can happen while tokenizing.
pub enum TokenizerErrorType {
    /// An unclosed multi-line comment was found
    UnclosedComment,
    /// An unclosed string was found
    UnclosedString,
    /// An invalid number was found
    InvalidNumber,
    /// An unexpected token was found
    UnexpectedToken(char),
    /// Symbol passed is not valid
    /// Returned from `TokenReference::symbol`
    InvalidSymbol(String),
}

TokenizerError

Information about an error that occurs while tokenizing.
pub struct TokenizerError {
    // internal fields
}

Methods

/// The type of error
pub fn error(&self) -> &TokenizerErrorType

/// The position of the first token that caused the error
pub fn position(&self) -> Position

/// The range of the token that caused the error
pub fn range(&self) -> (Position, Position)

Independent Tokenization

The tokenizer can be used independently from parsing, which is useful for:
  • Syntax highlighting
  • Code formatting tools
  • Comment extraction
  • Whitespace analysis
  • Creating custom parsers
use full_moon::tokenizer::{Lexer, TokenType};
use full_moon::LuaVersion;

let source = "-- comment\nlocal x = 5";
let lexer = Lexer::new(source, LuaVersion::lua51());
let tokens = lexer.collect().unwrap();

// Analyze individual tokens including comments and whitespace
for token in tokens {
    match token.token_type() {
        TokenType::SingleLineComment { comment } => {
            println!("Comment: {}", comment);
        }
        TokenType::Identifier { identifier } => {
            println!("Identifier: {}", identifier);
        }
        _ => {}
    }
}

Build docs developers (and LLMs) love