Lexer API

Overview

The Lexer class performs lexical analysis on AXON source code, converting raw text into a sequence of typed tokens. This is the first phase of the compilation pipeline.

from axon import Lexer

source = '''
persona LegalExpert {
    domain: ["contract law", "compliance"]
    tone: analytical
}
'''

lexer = Lexer(source, filename="example.axon")
tokens = lexer.tokenize()

for token in tokens:
    print(f"{token.type.name}: {token.value!r} (line {token.line})")

Class: Lexer

Constructor

source

str

required

The AXON source code to tokenize

filename

str

default:"<stdin>"

The source file name for error messages

Methods

`tokenize() -> list[Token]`

Scan the entire source and return all tokens. Returns: List of Token objects, ending with an EOF token Raises: AxonLexerError if invalid syntax is encountered

lexer = Lexer(source)
tokens = lexer.tokenize()

Token Types

The lexer recognizes the following token types:

Keywords

Token	Example
`PERSONA`	`persona`
`CONTEXT`	`context`
`ANCHOR`	`anchor`
`FLOW`	`flow`
`STEP`	`step`
`REASON`	`reason`
`VALIDATE`	`validate`
`MEMORY`	`memory`
`TOOL`	`tool`
`RUN`	`run`

Literals

Token	Example	Description
`STRING`	`"Hello world"`	Double-quoted strings with escape sequences
`INTEGER`	`42`	Integer literals
`FLOAT`	`3.14`	Floating-point literals
`DURATION`	`30s`, `5m`, `1h`	Duration literals with units
`BOOL`	`true`, `false`	Boolean literals

Operators & Punctuation

Token	Symbol	Description
`ARROW`	`->`	Arrow operator
`DOTDOT`	`..`	Range operator
`EQ`	`==`	Equality comparison
`NEQ`	`!=`	Not equal comparison
`LT`	`<`	Less than
`GT`	`>`	Greater than
`LTE`	`<=`	Less than or equal
`GTE`	`>=`	Greater than or equal
`LBRACE`	`{`	Left brace
`RBRACE`	`}`	Right brace
`LPAREN`	`(`	Left parenthesis
`RPAREN`	`)`	Right parenthesis
`LBRACKET`	`[`	Left bracket
`RBRACKET`	`]`	Right bracket
`COLON`	`:`	Colon
`COMMA`	`,`	Comma
`DOT`	`.`	Dot
`QUESTION`	`?`	Question mark (optional type)

Token Structure

from dataclasses import dataclass
from enum import Enum

class TokenType(Enum):
    PERSONA = "PERSONA"
    STRING = "STRING"
    # ... etc

@dataclass
class Token:
    type: TokenType
    value: str
    line: int
    column: int

Features

String Escape Sequences

The lexer supports standard escape sequences in string literals:

\n - newline
\t - tab
\\ - backslash
\" - double quote

source = r'"Line 1\nLine 2"'
lexer = Lexer(source)
tokens = lexer.tokenize()
print(tokens[0].value)  # "Line 1\nLine 2"

Comment Stripping

Both line comments (//) and block comments (/* */) are automatically removed:

source = '''
// This is a comment
persona Expert { /* inline comment */ }
'''

lexer = Lexer(source)
tokens = lexer.tokenize()
# Comments are not included in the token stream

Duration Literals

Duration values with time units are recognized as single tokens:

source = "timeout: 30s"
lexer = Lexer(source)
tokens = lexer.tokenize()
# tokens[2] = Token(DURATION, "30s", ...)

Supported units: s (seconds), ms (milliseconds), m (minutes), h (hours), d (days)

Error Handling

AxonLexerError

Raised when invalid syntax is encountered:

from axon import AxonLexerError

try:
    lexer = Lexer("persona @ invalid")
    tokens = lexer.tokenize()
except AxonLexerError as e:
    print(f"Error at line {e.line}, column {e.column}: {e.message}")

Attributes:

message: str - Human-readable error description
line: int - Line number where error occurred
column: int - Column number where error occurred

Common Errors

# Unterminated string
lexer = Lexer('"unterminated')
# AxonLexerError: Unterminated string

# Unterminated block comment
lexer = Lexer('/* comment')
# AxonLexerError: Unterminated block comment

# Unexpected character
lexer = Lexer('persona @ Expert')
# AxonLexerError: Unexpected character '@'

Example: Full Tokenization

from axon import Lexer

source = '''
flow Analyze(document: Document) -> Analysis {
    step Extract {
        ask: "Extract key facts"
        output: FactList
    }
}
'''

lexer = Lexer(source, filename="analyze.axon")
tokens = lexer.tokenize()

for token in tokens:
    if token.type.name != "EOF":
        print(f"{token.line:3d}:{token.column:3d}  {token.type.name:15s}  {token.value!r}")

Output:

 0  FLOW             'flow'
 5  IDENTIFIER       'Analyze'
12  LPAREN           '('
13  IDENTIFIER       'document'
21  COLON            ':'
23  IDENTIFIER       'Document'
31  RPAREN           ')'
33  ARROW            '->'
36  IDENTIFIER       'Analysis'
45  LBRACE           '{'
 4  STEP             'step'
  ...

Performance

The lexer is a single-pass scanner with O(n) time complexity where n is the length of the source code. It’s suitable for files up to several MB in size.

Next Steps

Parser API

Learn how to parse tokens into an abstract syntax tree

Python API

Runtime API

Overview

Class: Lexer

Constructor

Methods

`tokenize() -> list[Token]`

Token Types

Keywords

Literals

Operators & Punctuation

Token Structure

Features

String Escape Sequences

Comment Stripping

Duration Literals

Error Handling

AxonLexerError

Common Errors

Example: Full Tokenization

Performance

Next Steps

Parser API

Build docs developers (and LLMs) love

Python API

Runtime API

​Overview

​Class: Lexer

​Constructor

​Methods

​tokenize() -> list[Token]

​Token Types

​Keywords

​Literals

​Operators & Punctuation

​Token Structure

​Features

​String Escape Sequences

​Comment Stripping

​Duration Literals

​Error Handling

​AxonLexerError

​Common Errors

​Example: Full Tokenization

​Performance

​Next Steps

Parser API

Build docs developers (and LLMs) love

Overview

Class: Lexer

Constructor

Methods

`tokenize() -> list[Token]`

Token Types

Keywords

Literals

Operators & Punctuation

Token Structure

Features

String Escape Sequences

Comment Stripping

Duration Literals

Error Handling

AxonLexerError

Common Errors

Example: Full Tokenization

Performance

Next Steps