Parser

The Arc parser implements a complete ES2023+ JavaScript parser with strict mode enforcement, module support, and semantic validation. It uses recursive descent for statements and Pratt parsing for expressions.

Parser implementation

Location: src/arc/parser.gleam
Lines: 6800+ lines
Algorithm: Recursive descent + Pratt parsing

Key features

ES2023+ support

Full support for modern JavaScript including classes, generators, async/await, modules, optional chaining, and nullish coalescing.

Strict mode

Enforces ES2023 strict mode restrictions including reserved words, TDZ checks, and lexical scoping rules.

Module system

Complete import/export support with static analysis and validation.

Error recovery

100+ typed error variants with precise position tracking.

Architecture

The parser is a single-pass, recursive descent parser that consumes a token stream and produces a typed AST:

pub fn parse(source: String, mode: ParseMode) -> Result(ast.Program, ParseError)

Parse modes

pub type ParseMode {
  Script   // Global code (sloppy or strict)
  Module   // Always strict, top-level import/export
}

Modules are always strict mode ("use strict" is implicit). Scripts check for a directive prologue.

Recursive descent parsing

Statements are parsed using recursive descent:

fn parse_statement(p: P) -> Result(#(P, ast.Statement), ParseError) {
  case peek(p) {
    LeftBrace -> parse_block_statement(p)
    Var | Const -> parse_variable_declaration(p)
    If -> parse_if_statement(p)
    While -> parse_while_statement(p)
    For -> parse_for_statement(p)
    Function -> parse_function_declaration(p)
    Class -> parse_class_declaration(p)
    // ... 20+ statement types
  }
}

Each parse function:

Consumes tokens from the parser state
Recursively parses sub-structures
Validates semantic constraints
Returns updated parser state + AST node

Parser state

The parser threads an immutable P state through all parsing functions:

type P {
  P(
    tokens: List(Token),           // Remaining tokens
    mode: ParseMode,               // Script or Module
    prev_line: Int,                // Last token line (ASI)
    strict: Bool,                  // Strict mode active?
    
    // Context tracking
    function_depth: Int,           // Nested function depth
    loop_depth: Int,               // Nested loop depth
    in_generator: Bool,            // Inside generator?
    in_async: Bool,                // Inside async function?
    
    // Scope analysis
    scope_lexical: Set(String),    // let/const in current block
    scope_var: Set(String),        // var in current function
    scope_params: Set(String),     // function parameters
    
    // Module tracking
    export_names: Set(String),     // Exported names
    import_bindings: Set(String),  // Imported bindings
    
    // ... 30+ fields total
  )
}

The parser state is immutable. Every parsing operation returns a new P with updated fields. This enables backtracking (arrow function ambiguity) and keeps the parser pure.

Pratt parsing for expressions

Expressions use Pratt parsing (also called “top-down operator precedence”) to handle operator precedence and associativity:

// Precedence levels (lowest to highest)
Assignment       // =, +=, etc.
Conditional      // ? :
LogicalOr        // ||
LogicalAnd       // &&
BitwiseOr        // |
BitwiseXor       // ^
BitwiseAnd       // &
Equality         // ==, ===, !=, !==
Relational       // <, >, <=, >=, in, instanceof
Shift            // <<, >>, >>>
Additive         // +, -
Multiplicative   // *, /, %
Exponentiation   // **
Unary            // !, ~, +, -, typeof, void, delete
Postfix          // ++, --
Call             // f(), f.x, f[x], new f()
Primary          // literals, identifiers

Expression parsing flow

Parse primary expression

Start with a primary expression (literal, identifier, (expr), etc.):

fn parse_primary_expression(p: P) -> Result(#(P, ast.Expression), ParseError)

Parse left-hand side

Handle member access, calls, and new:

fn parse_lhs_expression(p: P) -> Result(#(P, ast.Expression), ParseError>

Handles:

obj.prop, obj[key]
fn(), obj.method()
new Ctor()
fn?.() (optional chaining)

Parse binary operators

Climb the precedence ladder using Pratt’s algorithm:

fn parse_binary_expression(
  p: P,
  left: ast.Expression,
  min_precedence: Int
) -> Result(#(P, ast.Expression), ParseError>

While the next operator has precedence ≥ min_precedence:

Consume operator
Parse right operand with higher precedence
Build binary node
Loop

Parse assignments

Check for assignment operators and validate LHS:

fn parse_assignment_expression(p: P) -> Result(#(P, ast.Expression), ParseError>

Validates that LHS is assignable (identifier, member expression, or destructuring pattern).

Example: parsing `a + b * c`

Parse a → primary expression
See + (precedence 10)
Parse RHS with precedence 11
- Parse b → primary
- See * (precedence 12 > 11) → consume
- Parse c → primary
- Return b * c
Build a + (b * c)

The precedence ensures * binds tighter than + without explicit parentheses.

Error handling

The parser defines 100+ typed error variants in ParseError:

pub type ParseError {
  ExpectedToken(expected: String, got: String, pos: Int)
  ExpectedIdentifier(pos: Int)
  UnexpectedToken(token: String, pos: Int)
  
  // Strict mode violations
  ReservedWordStrictMode(name: String, pos: Int)
  WithNotAllowedStrictMode(pos: Int)
  DuplicateParameterName(name: String, pos: Int)
  
  // Scope errors
  BreakOutsideLoopOrSwitch(pos: Int)
  ContinueOutsideLoop(pos: Int)
  ReturnOutsideFunction(pos: Int)
  
  // Module errors
  ImportNotTopLevel(pos: Int)
  DuplicateExport(name: String, pos: Int)
  UndeclaredExportBinding(name: String, pos: Int)
  
  // ... 100+ total variants
}

Errors are typed, not strings. Every error has a specific variant with structured data (position, names, etc.). This enables precise error reporting and IDE integration.

Convert errors to human-readable strings:

pub fn parse_error_to_string(error: ParseError) -> String
pub fn parse_error_pos(error: ParseError) -> Int

Example:

ExpectedToken(";", "}", 42)
  → "Expected ; but got } (at position 42)"

DuplicateParameterName("x", 100)
  → "Duplicate parameter name 'x' not allowed (at position 100)"

Semantic validation

The parser enforces semantic constraints during parsing:

Strict mode enforcement

Reserved words

// In strict mode, these are always reserved:
"implements" | "interface" | "package" | "private" | "protected" | "public"
→ Error(ReservedWordStrictMode(name, pos))

// eval/arguments cannot be binding names:
const eval = 42;
→ Error(StrictModeBindingName("eval", pos))

Temporal Dead Zone (TDZ)

// Accessing let/const before declaration:
console.log(x);  // ReferenceError at runtime
let x = 42;

// Parser tracks declarations in scope_lexical:
case dict.get(state.lexical_globals, name) {
  Ok(JsUninitialized) -> Error(/* TDZ violation */)
  Ok(value) -> // OK
}

The parser doesn’t reject TDZ violations (they’re runtime errors), but it tracks uninitialized bindings for compiler optimization.

Duplicate declarations

// let/const cannot redeclare in the same scope:
let x = 1;
let x = 2;  // SyntaxError
→ Error(IdentifierAlreadyDeclared("x", pos))

// var can redeclare:
var x = 1;
var x = 2;  // OK (same binding)

Tracked via scope_lexical and scope_var sets in P.

Function context restrictions

// return outside function:
return 42;
→ Error(ReturnOutsideFunction(pos))

// break/continue outside loop:
break;
→ Error(BreakOutsideLoopOrSwitch(pos))

// await outside async function:
await promise;
→ Error(AwaitInModule(pos))  // unless in module top-level

Tracked via function_depth, loop_depth, in_async, etc.

Module validation

Import/export rules

// Imports only at top level:
if (condition) {
  import x from "./mod.js";  // SyntaxError
}
→ Error(ImportNotTopLevel(pos))

// No duplicate exports:
export const x = 1;
export { x };  // SyntaxError
→ Error(DuplicateExport("x", pos))

// Export references must exist:
export { nonexistent };
→ Error(UndeclaredExportBinding("nonexistent", pos))

Validated via:

export_names: Set(String) — exported names
export_local_refs: List(#(String, Int)) — validated after parsing
import_bindings: Set(String) — imported names

Module/script mode separation

Module mode changes parsing behavior:

Always strict (no need for "use strict")
Top-level await allowed
import/export keywords enabled
Top-level this is undefined (not global)

case p.mode {
  Module -> // Allow await, import, export
  Script -> // Disallow at top level
}

Cover grammars

JavaScript has ambiguous syntax that requires lookahead or backtracking:

Arrow function ambiguity

(x)        // ParenthesizedExpression
(x) => x   // ArrowFunction

The parser can’t know if (x) is a grouped expression or arrow params until it sees =>. Solution: Parse as expression, then reinterpret:

use #(p, expr) <- result.try(parse_assignment_expression(p))
case peek(p) {
  Arrow -> {
    // Reinterpret expr as parameter list
    use params <- result.try(expr_to_params(expr))
    // Parse arrow body
  }
  _ -> Ok(#(p, expr))  // Just a normal expression
}

Object literal vs. destructuring

{x: y}           // ObjectExpression
{x: y} = obj     // ObjectPattern (destructuring)

Same syntax, different semantics. The parser uses cover grammars:

Parse as ObjectExpression
Set has_cover_initializer flag if default values appear
When = follows, reinterpret as ObjectPattern
Reject if used as normal expression and flag is set

case peek(p) {
  LeftBrace -> {
    use #(p, obj_expr) <- result.try(parse_object_literal(p))
    case peek(p) {
      Equal -> /* Reinterpret as pattern */
      _ -> case p.has_cover_initializer {
        True -> Error(ShorthandDefaultOutsideDestructuring(pos))
        False -> Ok(#(p, obj_expr))
      }
    }
  }
}

Performance considerations

The parser is designed for correctness, not raw speed:

Single-pass: No separate type-checking or name resolution pass
Immutable state: Every operation returns a new P (Gleam’s persistent data structures)
No memoization: Backtracking (arrow functions) re-parses on failure

Typical performance: ~10,000 lines/second on modern hardware (acceptable for a proof-of-concept).

Future optimizations:

Mutable parsing state (unsafe, but faster)
Lookahead caching for ambiguous constructs
Parallel parsing for module graphs

Entry points

Parse a script

import arc/parser
import arc/ast

case parser.parse(source, parser.Script) {
  Ok(ast.Script(body: stmts)) -> // Use stmts
  Error(err) -> {
    let msg = parser.parse_error_to_string(err)
    let pos = parser.parse_error_pos(err)
    // Report error
  }
}

Parse a module

case parser.parse(source, parser.Module) {
  Ok(ast.Module(body: items)) -> {
    // items: List(ModuleItem)
    // ModuleItem = ImportDeclaration | ExportDeclaration | StatementItem
  }
  Error(err) -> // Handle error
}

AST structure

The parser produces a typed AST defined in src/arc/ast.gleam:

pub type Program {
  Script(body: List(Statement))
  Module(body: List(ModuleItem))
}

pub type Statement {
  ExpressionStatement(Expression)
  VariableDeclaration(kind: VarKind, declarations: List(VariableDeclarator))
  FunctionDeclaration(name: Option(String), params: List(Pattern), body: List(Statement), ...)
  ClassDeclaration(name: Option(String), super_class: Option(Expression), body: List(ClassElement))
  IfStatement(test: Expression, consequent: Statement, alternate: Option(Statement))
  // ... 30+ statement types
}

pub type Expression {
  Identifier(String)
  NumberLiteral(Float)
  StringExpression(String)
  BinaryExpression(operator: BinaryOperator, left: Expression, right: Expression)
  CallExpression(callee: Expression, arguments: List(Expression))
  // ... 40+ expression types
}

Every AST node is a typed variant — no untyped maps or dynamic fields. The compiler guarantees exhaustive pattern matching.

Compiler

How the AST is compiled to bytecode

Lexer (source)

Token definitions and lexical analysis

AST (source)

Complete AST type definitions

Get Started

Core Concepts

Guides

Architecture

Parser implementation

Key features

ES2023+ support

Strict mode

Module system

Error recovery

Architecture

Parse modes

Recursive descent parsing

Parser state

Pratt parsing for expressions

Expression parsing flow

Example: parsing `a + b * c`

Error handling

Semantic validation

Strict mode enforcement

Module validation

Cover grammars

Arrow function ambiguity

Object literal vs. destructuring

Performance considerations

Entry points

Parse a script

Parse a module

AST structure

Further reading

Compiler

Lexer (source)

AST (source)

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Architecture

​Parser implementation

​Key features

ES2023+ support

Strict mode

Module system

Error recovery

​Architecture

​Parse modes

​Recursive descent parsing

​Parser state

​Pratt parsing for expressions

​Expression parsing flow

​Example: parsing a + b * c

​Error handling

​Semantic validation

​Strict mode enforcement

​Module validation

​Cover grammars

​Arrow function ambiguity

​Object literal vs. destructuring

​Performance considerations

​Entry points

​Parse a script

​Parse a module

​AST structure

​Further reading

Compiler

Lexer (source)

AST (source)

Build docs developers (and LLMs) love

Parser implementation

Key features

Architecture

Parse modes

Recursive descent parsing

Parser state

Pratt parsing for expressions

Expression parsing flow

Example: parsing `a + b * c`

Error handling

Semantic validation

Strict mode enforcement

Module validation

Cover grammars

Arrow function ambiguity

Object literal vs. destructuring

Performance considerations

Entry points

Parse a script

Parse a module

AST structure

Further reading