Skip to main content

Overview

The Lexer class performs lexical analysis on AXON source code, converting raw text into a sequence of typed tokens. This is the first phase of the compilation pipeline.
from axon import Lexer

source = '''
persona LegalExpert {
    domain: ["contract law", "compliance"]
    tone: analytical
}
'''

lexer = Lexer(source, filename="example.axon")
tokens = lexer.tokenize()

for token in tokens:
    print(f"{token.type.name}: {token.value!r} (line {token.line})")

Class: Lexer

Constructor

source
str
required
The AXON source code to tokenize
filename
str
default:"<stdin>"
The source file name for error messages

Methods

tokenize() -> list[Token]

Scan the entire source and return all tokens. Returns: List of Token objects, ending with an EOF token Raises: AxonLexerError if invalid syntax is encountered
lexer = Lexer(source)
tokens = lexer.tokenize()

Token Types

The lexer recognizes the following token types:

Keywords

TokenExample
PERSONApersona
CONTEXTcontext
ANCHORanchor
FLOWflow
STEPstep
REASONreason
VALIDATEvalidate
MEMORYmemory
TOOLtool
RUNrun

Literals

TokenExampleDescription
STRING"Hello world"Double-quoted strings with escape sequences
INTEGER42Integer literals
FLOAT3.14Floating-point literals
DURATION30s, 5m, 1hDuration literals with units
BOOLtrue, falseBoolean literals

Operators & Punctuation

TokenSymbolDescription
ARROW->Arrow operator
DOTDOT..Range operator
EQ==Equality comparison
NEQ!=Not equal comparison
LT<Less than
GT>Greater than
LTE<=Less than or equal
GTE>=Greater than or equal
LBRACE{Left brace
RBRACE}Right brace
LPAREN(Left parenthesis
RPAREN)Right parenthesis
LBRACKET[Left bracket
RBRACKET]Right bracket
COLON:Colon
COMMA,Comma
DOT.Dot
QUESTION?Question mark (optional type)

Token Structure

from dataclasses import dataclass
from enum import Enum

class TokenType(Enum):
    PERSONA = "PERSONA"
    STRING = "STRING"
    # ... etc

@dataclass
class Token:
    type: TokenType
    value: str
    line: int
    column: int

Features

String Escape Sequences

The lexer supports standard escape sequences in string literals:
  • \n - newline
  • \t - tab
  • \\ - backslash
  • \" - double quote
source = r'"Line 1\nLine 2"'
lexer = Lexer(source)
tokens = lexer.tokenize()
print(tokens[0].value)  # "Line 1\nLine 2"

Comment Stripping

Both line comments (//) and block comments (/* */) are automatically removed:
source = '''
// This is a comment
persona Expert { /* inline comment */ }
'''

lexer = Lexer(source)
tokens = lexer.tokenize()
# Comments are not included in the token stream

Duration Literals

Duration values with time units are recognized as single tokens:
source = "timeout: 30s"
lexer = Lexer(source)
tokens = lexer.tokenize()
# tokens[2] = Token(DURATION, "30s", ...)
Supported units: s (seconds), ms (milliseconds), m (minutes), h (hours), d (days)

Error Handling

AxonLexerError

Raised when invalid syntax is encountered:
from axon import AxonLexerError

try:
    lexer = Lexer("persona @ invalid")
    tokens = lexer.tokenize()
except AxonLexerError as e:
    print(f"Error at line {e.line}, column {e.column}: {e.message}")
Attributes:
  • message: str - Human-readable error description
  • line: int - Line number where error occurred
  • column: int - Column number where error occurred

Common Errors

# Unterminated string
lexer = Lexer('"unterminated')
# AxonLexerError: Unterminated string

# Unterminated block comment
lexer = Lexer('/* comment')
# AxonLexerError: Unterminated block comment

# Unexpected character
lexer = Lexer('persona @ Expert')
# AxonLexerError: Unexpected character '@'

Example: Full Tokenization

from axon import Lexer

source = '''
flow Analyze(document: Document) -> Analysis {
    step Extract {
        ask: "Extract key facts"
        output: FactList
    }
}
'''

lexer = Lexer(source, filename="analyze.axon")
tokens = lexer.tokenize()

for token in tokens:
    if token.type.name != "EOF":
        print(f"{token.line:3d}:{token.column:3d}  {token.type.name:15s}  {token.value!r}")
Output:
  1:  0  FLOW             'flow'
  1:  5  IDENTIFIER       'Analyze'
  1: 12  LPAREN           '('
  1: 13  IDENTIFIER       'document'
  1: 21  COLON            ':'
  1: 23  IDENTIFIER       'Document'
  1: 31  RPAREN           ')'
  1: 33  ARROW            '->'
  1: 36  IDENTIFIER       'Analysis'
  1: 45  LBRACE           '{'
  2:  4  STEP             'step'
  ...

Performance

The lexer is a single-pass scanner with O(n) time complexity where n is the length of the source code. It’s suitable for files up to several MB in size.

Next Steps

Parser API

Learn how to parse tokens into an abstract syntax tree

Build docs developers (and LLMs) love