Skip to main content
The Arc parser implements a complete ES2023+ JavaScript parser with strict mode enforcement, module support, and semantic validation. It uses recursive descent for statements and Pratt parsing for expressions.

Parser implementation

Location: src/arc/parser.gleam
Lines: 6800+ lines
Algorithm: Recursive descent + Pratt parsing

Key features

ES2023+ support

Full support for modern JavaScript including classes, generators, async/await, modules, optional chaining, and nullish coalescing.

Strict mode

Enforces ES2023 strict mode restrictions including reserved words, TDZ checks, and lexical scoping rules.

Module system

Complete import/export support with static analysis and validation.

Error recovery

100+ typed error variants with precise position tracking.

Architecture

The parser is a single-pass, recursive descent parser that consumes a token stream and produces a typed AST:
pub fn parse(source: String, mode: ParseMode) -> Result(ast.Program, ParseError)

Parse modes

pub type ParseMode {
  Script   // Global code (sloppy or strict)
  Module   // Always strict, top-level import/export
}
Modules are always strict mode ("use strict" is implicit). Scripts check for a directive prologue.

Recursive descent parsing

Statements are parsed using recursive descent:
fn parse_statement(p: P) -> Result(#(P, ast.Statement), ParseError) {
  case peek(p) {
    LeftBrace -> parse_block_statement(p)
    Var | Const -> parse_variable_declaration(p)
    If -> parse_if_statement(p)
    While -> parse_while_statement(p)
    For -> parse_for_statement(p)
    Function -> parse_function_declaration(p)
    Class -> parse_class_declaration(p)
    // ... 20+ statement types
  }
}
Each parse function:
  1. Consumes tokens from the parser state
  2. Recursively parses sub-structures
  3. Validates semantic constraints
  4. Returns updated parser state + AST node

Parser state

The parser threads an immutable P state through all parsing functions:
type P {
  P(
    tokens: List(Token),           // Remaining tokens
    mode: ParseMode,               // Script or Module
    prev_line: Int,                // Last token line (ASI)
    strict: Bool,                  // Strict mode active?
    
    // Context tracking
    function_depth: Int,           // Nested function depth
    loop_depth: Int,               // Nested loop depth
    in_generator: Bool,            // Inside generator?
    in_async: Bool,                // Inside async function?
    
    // Scope analysis
    scope_lexical: Set(String),    // let/const in current block
    scope_var: Set(String),        // var in current function
    scope_params: Set(String),     // function parameters
    
    // Module tracking
    export_names: Set(String),     // Exported names
    import_bindings: Set(String),  // Imported bindings
    
    // ... 30+ fields total
  )
}
The parser state is immutable. Every parsing operation returns a new P with updated fields. This enables backtracking (arrow function ambiguity) and keeps the parser pure.

Pratt parsing for expressions

Expressions use Pratt parsing (also called “top-down operator precedence”) to handle operator precedence and associativity:
// Precedence levels (lowest to highest)
Assignment       // =, +=, etc.
Conditional      // ? :
LogicalOr        // ||
LogicalAnd       // &&
BitwiseOr        // |
BitwiseXor       // ^
BitwiseAnd       // &
Equality         // ==, ===, !=, !==
Relational       // <, >, <=, >=, in, instanceof
Shift            // <<, >>, >>>
Additive         // +, -
Multiplicative   // *, /, %
Exponentiation   // **
Unary            // !, ~, +, -, typeof, void, delete
Postfix          // ++, --
Call             // f(), f.x, f[x], new f()
Primary          // literals, identifiers

Expression parsing flow

1

Parse primary expression

Start with a primary expression (literal, identifier, (expr), etc.):
fn parse_primary_expression(p: P) -> Result(#(P, ast.Expression), ParseError)
2

Parse left-hand side

Handle member access, calls, and new:
fn parse_lhs_expression(p: P) -> Result(#(P, ast.Expression), ParseError>
Handles:
  • obj.prop, obj[key]
  • fn(), obj.method()
  • new Ctor()
  • fn?.() (optional chaining)
3

Parse binary operators

Climb the precedence ladder using Pratt’s algorithm:
fn parse_binary_expression(
  p: P,
  left: ast.Expression,
  min_precedence: Int
) -> Result(#(P, ast.Expression), ParseError>
While the next operator has precedence ≥ min_precedence:
  1. Consume operator
  2. Parse right operand with higher precedence
  3. Build binary node
  4. Loop
4

Parse assignments

Check for assignment operators and validate LHS:
fn parse_assignment_expression(p: P) -> Result(#(P, ast.Expression), ParseError>
Validates that LHS is assignable (identifier, member expression, or destructuring pattern).

Example: parsing a + b * c

  1. Parse a → primary expression
  2. See + (precedence 10)
  3. Parse RHS with precedence 11
    • Parse b → primary
    • See * (precedence 12 > 11) → consume
    • Parse c → primary
    • Return b * c
  4. Build a + (b * c)
The precedence ensures * binds tighter than + without explicit parentheses.

Error handling

The parser defines 100+ typed error variants in ParseError:
pub type ParseError {
  ExpectedToken(expected: String, got: String, pos: Int)
  ExpectedIdentifier(pos: Int)
  UnexpectedToken(token: String, pos: Int)
  
  // Strict mode violations
  ReservedWordStrictMode(name: String, pos: Int)
  WithNotAllowedStrictMode(pos: Int)
  DuplicateParameterName(name: String, pos: Int)
  
  // Scope errors
  BreakOutsideLoopOrSwitch(pos: Int)
  ContinueOutsideLoop(pos: Int)
  ReturnOutsideFunction(pos: Int)
  
  // Module errors
  ImportNotTopLevel(pos: Int)
  DuplicateExport(name: String, pos: Int)
  UndeclaredExportBinding(name: String, pos: Int)
  
  // ... 100+ total variants
}
Errors are typed, not strings. Every error has a specific variant with structured data (position, names, etc.). This enables precise error reporting and IDE integration.
Convert errors to human-readable strings:
pub fn parse_error_to_string(error: ParseError) -> String
pub fn parse_error_pos(error: ParseError) -> Int
Example:
ExpectedToken(";", "}", 42)
"Expected ; but got } (at position 42)"

DuplicateParameterName("x", 100)
"Duplicate parameter name 'x' not allowed (at position 100)"

Semantic validation

The parser enforces semantic constraints during parsing:

Strict mode enforcement

// In strict mode, these are always reserved:
"implements" | "interface" | "package" | "private" | "protected" | "public"
Error(ReservedWordStrictMode(name, pos))

// eval/arguments cannot be binding names:
const eval = 42;
Error(StrictModeBindingName("eval", pos))
// Accessing let/const before declaration:
console.log(x);  // ReferenceError at runtime
let x = 42;

// Parser tracks declarations in scope_lexical:
case dict.get(state.lexical_globals, name) {
  Ok(JsUninitialized) -> Error(/* TDZ violation */)
  Ok(value) -> // OK
}
The parser doesn’t reject TDZ violations (they’re runtime errors), but it tracks uninitialized bindings for compiler optimization.
// let/const cannot redeclare in the same scope:
let x = 1;
let x = 2;  // SyntaxError
Error(IdentifierAlreadyDeclared("x", pos))

// var can redeclare:
var x = 1;
var x = 2;  // OK (same binding)
Tracked via scope_lexical and scope_var sets in P.
// return outside function:
return 42;
Error(ReturnOutsideFunction(pos))

// break/continue outside loop:
break;
Error(BreakOutsideLoopOrSwitch(pos))

// await outside async function:
await promise;
Error(AwaitInModule(pos))  // unless in module top-level
Tracked via function_depth, loop_depth, in_async, etc.

Module validation

// Imports only at top level:
if (condition) {
  import x from "./mod.js";  // SyntaxError
}
Error(ImportNotTopLevel(pos))

// No duplicate exports:
export const x = 1;
export { x };  // SyntaxError
Error(DuplicateExport("x", pos))

// Export references must exist:
export { nonexistent };
Error(UndeclaredExportBinding("nonexistent", pos))
Validated via:
  • export_names: Set(String) — exported names
  • export_local_refs: List(#(String, Int)) — validated after parsing
  • import_bindings: Set(String) — imported names
Module mode changes parsing behavior:
  • Always strict (no need for "use strict")
  • Top-level await allowed
  • import/export keywords enabled
  • Top-level this is undefined (not global)
case p.mode {
  Module -> // Allow await, import, export
  Script -> // Disallow at top level
}

Cover grammars

JavaScript has ambiguous syntax that requires lookahead or backtracking:

Arrow function ambiguity

(x)        // ParenthesizedExpression
(x) => x   // ArrowFunction
The parser can’t know if (x) is a grouped expression or arrow params until it sees =>. Solution: Parse as expression, then reinterpret:
use #(p, expr) <- result.try(parse_assignment_expression(p))
case peek(p) {
  Arrow -> {
    // Reinterpret expr as parameter list
    use params <- result.try(expr_to_params(expr))
    // Parse arrow body
  }
  _ -> Ok(#(p, expr))  // Just a normal expression
}

Object literal vs. destructuring

{x: y}           // ObjectExpression
{x: y} = obj     // ObjectPattern (destructuring)
Same syntax, different semantics. The parser uses cover grammars:
  1. Parse as ObjectExpression
  2. Set has_cover_initializer flag if default values appear
  3. When = follows, reinterpret as ObjectPattern
  4. Reject if used as normal expression and flag is set
case peek(p) {
  LeftBrace -> {
    use #(p, obj_expr) <- result.try(parse_object_literal(p))
    case peek(p) {
      Equal -> /* Reinterpret as pattern */
      _ -> case p.has_cover_initializer {
        True -> Error(ShorthandDefaultOutsideDestructuring(pos))
        False -> Ok(#(p, obj_expr))
      }
    }
  }
}

Performance considerations

The parser is designed for correctness, not raw speed:
  • Single-pass: No separate type-checking or name resolution pass
  • Immutable state: Every operation returns a new P (Gleam’s persistent data structures)
  • No memoization: Backtracking (arrow functions) re-parses on failure
Typical performance: ~10,000 lines/second on modern hardware (acceptable for a proof-of-concept).
Future optimizations:
  • Mutable parsing state (unsafe, but faster)
  • Lookahead caching for ambiguous constructs
  • Parallel parsing for module graphs

Entry points

Parse a script

import arc/parser
import arc/ast

case parser.parse(source, parser.Script) {
  Ok(ast.Script(body: stmts)) -> // Use stmts
  Error(err) -> {
    let msg = parser.parse_error_to_string(err)
    let pos = parser.parse_error_pos(err)
    // Report error
  }
}

Parse a module

case parser.parse(source, parser.Module) {
  Ok(ast.Module(body: items)) -> {
    // items: List(ModuleItem)
    // ModuleItem = ImportDeclaration | ExportDeclaration | StatementItem
  }
  Error(err) -> // Handle error
}

AST structure

The parser produces a typed AST defined in src/arc/ast.gleam:
pub type Program {
  Script(body: List(Statement))
  Module(body: List(ModuleItem))
}

pub type Statement {
  ExpressionStatement(Expression)
  VariableDeclaration(kind: VarKind, declarations: List(VariableDeclarator))
  FunctionDeclaration(name: Option(String), params: List(Pattern), body: List(Statement), ...)
  ClassDeclaration(name: Option(String), super_class: Option(Expression), body: List(ClassElement))
  IfStatement(test: Expression, consequent: Statement, alternate: Option(Statement))
  // ... 30+ statement types
}

pub type Expression {
  Identifier(String)
  NumberLiteral(Float)
  StringExpression(String)
  BinaryExpression(operator: BinaryOperator, left: Expression, right: Expression)
  CallExpression(callee: Expression, arguments: List(Expression))
  // ... 40+ expression types
}
Every AST node is a typed variant — no untyped maps or dynamic fields. The compiler guarantees exhaustive pattern matching.

Further reading

Compiler

How the AST is compiled to bytecode

Lexer (source)

Token definitions and lexical analysis

AST (source)

Complete AST type definitions

Build docs developers (and LLMs) love