Recursive descent parser for ES2023+ JavaScript with Pratt expression handling
The Arc parser implements a complete ES2023+ JavaScript parser with strict mode enforcement, module support, and semantic validation. It uses recursive descent for statements and Pratt parsing for expressions.
The parser threads an immutable P state through all parsing functions:
type P { P( tokens: List(Token), // Remaining tokens mode: ParseMode, // Script or Module prev_line: Int, // Last token line (ASI) strict: Bool, // Strict mode active? // Context tracking function_depth: Int, // Nested function depth loop_depth: Int, // Nested loop depth in_generator: Bool, // Inside generator? in_async: Bool, // Inside async function? // Scope analysis scope_lexical: Set(String), // let/const in current block scope_var: Set(String), // var in current function scope_params: Set(String), // function parameters // Module tracking export_names: Set(String), // Exported names import_bindings: Set(String), // Imported bindings // ... 30+ fields total )}
The parser state is immutable. Every parsing operation returns a new P with updated fields. This enables backtracking (arrow function ambiguity) and keeps the parser pure.
Errors are typed, not strings. Every error has a specific variant with structured data (position, names, etc.). This enables precise error reporting and IDE integration.
ExpectedToken(";", "}", 42) → "Expected ; but got } (at position 42)"DuplicateParameterName("x", 100) → "Duplicate parameter name 'x' not allowed (at position 100)"
// In strict mode, these are always reserved:"implements" | "interface" | "package" | "private" | "protected" | "public"→ Error(ReservedWordStrictMode(name, pos))// eval/arguments cannot be binding names:const eval = 42;→ Error(StrictModeBindingName("eval", pos))
Temporal Dead Zone (TDZ)
// Accessing let/const before declaration:console.log(x); // ReferenceError at runtimelet x = 42;// Parser tracks declarations in scope_lexical:case dict.get(state.lexical_globals, name) { Ok(JsUninitialized) -> Error(/* TDZ violation */) Ok(value) -> // OK}
The parser doesn’t reject TDZ violations (they’re runtime errors), but it tracks uninitialized bindings for compiler optimization.
Duplicate declarations
// let/const cannot redeclare in the same scope:let x = 1;let x = 2; // SyntaxError→ Error(IdentifierAlreadyDeclared("x", pos))// var can redeclare:var x = 1;var x = 2; // OK (same binding)
Tracked via scope_lexical and scope_var sets in P.
// Imports only at top level:if (condition) { import x from "./mod.js"; // SyntaxError}→ Error(ImportNotTopLevel(pos))// No duplicate exports:export const x = 1;export { x }; // SyntaxError→ Error(DuplicateExport("x", pos))// Export references must exist:export { nonexistent };→ Error(UndeclaredExportBinding("nonexistent", pos))
Validated via:
export_names: Set(String) — exported names
export_local_refs: List(#(String, Int)) — validated after parsing
import_bindings: Set(String) — imported names
Module/script mode separation
Module mode changes parsing behavior:
Always strict (no need for "use strict")
Top-level await allowed
import/export keywords enabled
Top-level this is undefined (not global)
case p.mode { Module -> // Allow await, import, export Script -> // Disallow at top level}
(x) // ParenthesizedExpression(x) => x // ArrowFunction
The parser can’t know if (x) is a grouped expression or arrow params until it sees =>.Solution: Parse as expression, then reinterpret:
use #(p, expr) <- result.try(parse_assignment_expression(p))case peek(p) { Arrow -> { // Reinterpret expr as parameter list use params <- result.try(expr_to_params(expr)) // Parse arrow body } _ -> Ok(#(p, expr)) // Just a normal expression}