How full-moon converts Lua source code into tokens
Tokenization is the first step in parsing Lua code with Full Moon. The tokenizer (also called a lexer) converts a stream of characters into a sequence of meaningful tokens while preserving all formatting information.
From src/tokenizer/structs.rs:427-467, a token represents a single meaningful unit of code:
/// A token consisting of its [`Position`] and a [`TokenType`]pub struct Token { pub(crate) start_position: Position, pub(crate) end_position: Position, pub(crate) token_type: TokenType,}impl Token { /// The position a token begins at pub fn start_position(&self) -> Position /// The position a token ends at pub fn end_position(&self) -> Position /// The type of token as well as the data needed to represent it pub fn token_type(&self) -> &TokenType /// The kind of token with no additional data pub fn token_kind(&self) -> TokenKind}
From src/tokenizer/structs.rs:852-887, every token tracks its exact position:
/// Used to represent exact positions of tokens in codepub struct Position { pub(crate) bytes: usize, pub(crate) line: usize, pub(crate) character: usize,}impl Position { /// How many bytes, ignoring lines, it would take to find this position pub fn bytes(self) -> usize { self.bytes } /// Index of the character on the line for this position pub fn character(self) -> usize { self.character } /// Line the position lies on pub fn line(self) -> usize { self.line }}
Example:
local x = 1print(x)
local: Position { bytes: 0, line: 1, character: 1 }
x (line 1): Position { bytes: 6, line: 1, character: 7 }
print: Position { bytes: 12, line: 2, character: 1 }
From src/tokenizer/structs.rs:199-230, tokenization can fail with:
pub enum TokenizerErrorType { /// An unclosed multi-line comment was found UnclosedComment, /// An unclosed string was found UnclosedString, /// An invalid number was found InvalidNumber, /// An unexpected token was found UnexpectedToken(char), /// Symbol passed is not valid InvalidSymbol(String),}
Example:
use full_moon::tokenizer::{Lexer, LuaVersion};let bad_code = "local x = \"unclosed string";let mut lexer = Lexer::new_lazy(bad_code, LuaVersion::new());while let Some(result) = lexer.process_next() { if let LexerResult::Fatal(errors) = result { for error in errors { println!("Error: {}", error); // Prints: "unclosed string (line:1, char:11)" } }}
From src/tokenizer/structs.rs:606-738, you can create tokens for code generation:
use full_moon::tokenizer::{TokenReference, Symbol};// Create a symbol with whitespacelet return_token = TokenReference::symbol("return ")?;assert_eq!(return_token.token().token_type(), &TokenType::Symbol { symbol: Symbol::Return,});// Leading trivia: none// Token: Symbol::Return// Trailing trivia: one space
Use TokenReference::symbol() to create tokens with proper trivia parsing. The input string can include leading and trailing whitespace.