Tokenizer Overview

The tokenizer module provides a complete API for tokenizing Lua source code into individual tokens. Unlike parsing, tokenization operates at a lower level and can be used independently to analyze code structure, whitespace, and comments.

Key Components

Token - Individual tokens with position information
TokenReference - Tokens with leading/trailing trivia
Position - Exact position tracking in source code
Lexer - The tokenization engine

Basic Usage

The tokenizer can be used independently from the parser:

use full_moon::tokenizer::Lexer;
use full_moon::LuaVersion;

let source = "local x = 5";
let lexer = Lexer::new(source, LuaVersion::lua51());
let tokens = lexer.collect().unwrap();

Lexer

The main entry point for tokenization.

pub struct Lexer {
    pub lua_version: LuaVersion,
    // internal fields omitted
}

Methods

`new`

Creates a new Lexer from the given source string and Lua version(s).

pub fn new(source: &str, lua_version: LuaVersion) -> Self

`current`

Returns the current token.

pub fn current(&self) -> Option<&LexerResult<TokenReference>>

`peek`

Returns the next token without consuming the current one.

pub fn peek(&self) -> Option<&LexerResult<TokenReference>>

`consume`

Consumes the current token and returns it.

pub fn consume(&mut self) -> Option<LexerResult<TokenReference>>

`collect`

Returns a vector of all tokens left in the source string.

pub fn collect(self) -> LexerResult<Vec<Token>>

`process_next`

Processes and returns the next token in the source string, ignoring trivia.

pub fn process_next(&mut self) -> Option<LexerResult<Token>>

LexerResult

The result of a lexer operation.

pub enum LexerResult<T> {
    /// The lexer operation was successful.
    Ok(T),
    /// The lexer operation was unsuccessful, and could not be recovered.
    Fatal(Vec<TokenizerError>),
    /// The lexer operation was unsuccessful, but some result can be extracted.
    Recovered(T, Vec<TokenizerError>),
}

Methods

`unwrap`

Unwraps the result, panicking if it is not LexerResult::Ok.

pub fn unwrap(self) -> T

`unwrap_errors`

Unwraps the errors, panicking if it is LexerResult::Ok.

pub fn unwrap_errors(self) -> Vec<TokenizerError>

`errors`

Returns the errors, if there was any.

pub fn errors(self) -> Vec<TokenizerError>

Error Handling

TokenizerErrorType

The possible errors that can happen while tokenizing.

pub enum TokenizerErrorType {
    /// An unclosed multi-line comment was found
    UnclosedComment,
    /// An unclosed string was found
    UnclosedString,
    /// An invalid number was found
    InvalidNumber,
    /// An unexpected token was found
    UnexpectedToken(char),
    /// Symbol passed is not valid
    /// Returned from `TokenReference::symbol`
    InvalidSymbol(String),
}

TokenizerError

Information about an error that occurs while tokenizing.

pub struct TokenizerError {
    // internal fields
}

Methods

/// The type of error
pub fn error(&self) -> &TokenizerErrorType

/// The position of the first token that caused the error
pub fn position(&self) -> Position

/// The range of the token that caused the error
pub fn range(&self) -> (Position, Position)

Independent Tokenization

The tokenizer can be used independently from parsing, which is useful for:

Syntax highlighting
Code formatting tools
Comment extraction
Whitespace analysis
Creating custom parsers

use full_moon::tokenizer::{Lexer, TokenType};
use full_moon::LuaVersion;

let source = "-- comment\nlocal x = 5";
let lexer = Lexer::new(source, LuaVersion::lua51());
let tokens = lexer.collect().unwrap();

// Analyze individual tokens including comments and whitespace
for token in tokens {
    match token.token_type() {
        TokenType::SingleLineComment { comment } => {
            println!("Comment: {}", comment);
        }
        TokenType::Identifier { identifier } => {
            println!("Identifier: {}", identifier);
        }
        _ => {}
    }
}

Core API

AST Nodes

Tokenizer

Visitors

Language Variants

Tokenizer Overview

Key Components

Basic Usage

Lexer

Methods

`new`

`current`

`peek`

`consume`

`collect`

`process_next`

LexerResult

Methods

`unwrap`

`unwrap_errors`

`errors`

Error Handling

TokenizerErrorType

TokenizerError

Methods

Independent Tokenization

Build docs developers (and LLMs) love

Core API

AST Nodes

Tokenizer

Visitors

Language Variants

​Key Components

​Basic Usage

​Lexer

​Methods

​new

​current

​peek

​consume

​collect

​process_next

​LexerResult

​Methods

​unwrap

​unwrap_errors

​errors

​Error Handling

​TokenizerErrorType

​TokenizerError

​Methods

​Independent Tokenization

Build docs developers (and LLMs) love

Key Components

Basic Usage

Lexer

Methods

`new`

`current`

`peek`

`consume`

`collect`

`process_next`

LexerResult

Methods

`unwrap`

`unwrap_errors`

`errors`

Error Handling

TokenizerErrorType

TokenizerError

Methods

Independent Tokenization