The lexer module provides tokenization of SQL source text into a stream of tokens. It handles keywords, identifiers, literals, operators, and comments.
Lexer
The main lexer struct that tokenizes SQL source code.
pub struct Lexer<'a> {
pub source: &'a str,
pub rest: &'a str,
pub position: usize,
pub peeked: Option<Result<Token<'a>, SQLError<'a>>>,
}
The original source string
The remaining unparsed portion of the source
Current byte position in the source
peeked
Option<Result<Token<'a>, SQLError<'a>>>
Cached next token for lookahead
Constructor
pub fn new(source: &'a str) -> Self
Creates a new lexer for the given source string.
Example:
use databas_sql_parser::lexer::Lexer;
let lexer = Lexer::new("SELECT * FROM users;");
Methods
peek
pub fn peek(&mut self) -> Option<&Result<Token<'a>, SQLError<'_>>>
Peeks at the next token without consuming it. Returns a reference to the next token result.
Example:
let mut lexer = Lexer::new("SELECT * FROM users");
if let Some(Ok(token)) = lexer.peek() {
println!("Next token: {:?}", token.kind);
}
expect_token
pub fn expect_token(&mut self, expected: TokenKind<'a>) -> Result<(), SQLError<'a>>
Consumes the next token and verifies it matches the expected token kind. Returns an error if the token doesn’t match or if the input ends unexpectedly.
Example:
use databas_sql_parser::lexer::Lexer;
use databas_sql_parser::lexer::token_kind::{TokenKind, Keyword};
let mut lexer = Lexer::new("SELECT * FROM users");
lexer.expect_token(TokenKind::Keyword(Keyword::Select)).unwrap();
expect_where
pub fn expect_where(
&mut self,
check: impl Fn(TokenKind<'a>) -> bool,
) -> Result<(), SQLError<'a>>
Consumes the next token and verifies it satisfies the given predicate.
check
impl Fn(TokenKind<'a>) -> bool
Predicate function to validate the token
Example:
let mut lexer = Lexer::new("123 + 456");
lexer.expect_where(|kind| matches!(kind, TokenKind::Number(_))).unwrap();
Iterator Implementation
The Lexer implements Iterator, producing tokens until the end of input.
impl<'a> Iterator for Lexer<'a> {
type Item = Result<Token<'a>, SQLError<'a>>;
fn next(&mut self) -> Option<Self::Item>;
}
Example:
let lexer = Lexer::new("SELECT id, name FROM users");
for token_result in lexer {
match token_result {
Ok(token) => println!("{}", token),
Err(e) => eprintln!("Error: {}", e),
}
}
Token
Represents a single token with its kind and position.
pub struct Token<'a> {
pub kind: TokenKind<'a>,
pub offset: usize,
}
The type and value of the token
Byte offset where this token starts in the source
Example:
use databas_sql_parser::lexer::Lexer;
let mut lexer = Lexer::new("SELECT");
let token = lexer.next().unwrap().unwrap();
assert_eq!(token.offset, 0);
println!("Token at position {}: {}", token.offset, token.kind);
TokenKind
Enumerates all possible token types.
pub enum TokenKind<'a> {
String(&'a str),
Identifier(&'a str),
Keyword(Keyword),
Number(NumberKind),
LeftParen,
RightParen,
Plus,
Minus,
Equals,
NotEquals,
EqualsEquals,
LessThan,
GreaterThan,
LessThanOrEqual,
GreaterThanOrEqual,
Asterisk,
Comma,
Semicolon,
Slash,
}
Keyword
SQL keywords recognized by the lexer.
pub enum Keyword {
Select,
From,
Where,
Order,
By,
Asc,
Desc,
True,
False,
And,
Or,
Not,
Limit,
Offset,
Insert,
Into,
Values,
Create,
Table,
Int,
Float,
Text,
Aggregate(Aggregate),
Primary,
Key,
Nullable,
}
Aggregate
Aggregate function keywords.
pub enum Aggregate {
Sum,
Avg,
StdDev,
Min,
Max,
Count,
}
NumberKind
Numeric literal types.
pub enum NumberKind {
Integer(i32),
Float(f32),
}
The lexer automatically skips whitespace and comments:
- Line comments: Start with
-- and continue to end of line
- Block comments: Enclosed in
/* */
Example:
let lexer = Lexer::new("
-- This is a comment
SELECT /* inline comment */ id
FROM users
");
for token in lexer {
// Comments are automatically filtered out
println!("{:?}", token);
}
String Literals
The lexer supports both single and double-quoted strings:
let mut lexer = Lexer::new("'hello' \"world\"");
let token1 = lexer.next().unwrap().unwrap();
assert!(matches!(token1.kind, TokenKind::String("hello")));
let token2 = lexer.next().unwrap().unwrap();
assert!(matches!(token2.kind, TokenKind::String("world")));
Case-Insensitive Keywords
Keywords are recognized case-insensitively:
let lexer = Lexer::new("sEleCT FrOm WhErE");
for result in lexer {
if let Ok(token) = result {
// All recognized as keywords despite mixed case
assert!(matches!(token.kind, TokenKind::Keyword(_)));
}
}
Error Handling
The lexer returns detailed errors for invalid input:
use databas_sql_parser::lexer::Lexer;
use databas_sql_parser::error::{SQLError, SQLErrorKind};
let mut lexer = Lexer::new("\"unterminated string");
let result = lexer.next().unwrap();
assert!(matches!(
result,
Err(SQLError { kind: SQLErrorKind::UnterminatedString, .. })
));
Unicode Support
Identifiers support Unicode alphabetic characters:
let mut lexer = Lexer::new("åäö");
let token = lexer.next().unwrap().unwrap();
assert!(matches!(token.kind, TokenKind::Identifier("åäö")));