Pipeline architecture
Arc uses a multi-phase pipeline to transform JavaScript source code into executable bytecode:Phase 1: Lexical analysis
The lexer (src/arc/lexer.gleam) scans the source string and produces a token stream. It handles:
- ES2023+ keywords and operators
- String literals with escape sequences
- Numeric literals (decimal, binary, octal, hex)
- Regular expression literals
- Template literals
- Automatic semicolon insertion
Phase 2: Parsing
The parser (src/arc/parser.gleam) consumes tokens and builds an Abstract Syntax Tree (AST). See Parser details for the full implementation.
Key features:
- Recursive descent parsing for statements
- Pratt parsing for expression precedence
- ES2023+ strict mode enforcement
- Full module and script support
Phase 3: Compilation
The compiler (src/arc/compiler.gleam) transforms the AST into executable bytecode through a three-phase pipeline:
- Emit → AST to symbolic IR (EmitterOp)
- Scope → Resolve variables to local indices
- Resolve → Convert labels to absolute addresses
Phase 4: Execution
The VM (src/arc/vm/vm.gleam) executes bytecode using a stack machine model. See VM details.
Core components
Lexer
Location:src/arc/lexer.gleam
Converts raw source text into tokens. Handles all ES2023+ token types including:
- Keywords (
function,class,async, etc.) - Operators (
+,===,?.,??, etc.) - Literals (strings, numbers, regex, templates)
- Identifiers and contextual keywords
Parser
Location:src/arc/parser.gleam
Builds a complete AST with semantic validation:
- Strict mode enforcement
- TDZ (Temporal Dead Zone) tracking
- Scope analysis (lexical vs var)
- Export/import validation for modules
ast.Program) with variants for scripts and modules.
Compiler
Location:src/arc/compiler.gleam, src/arc/compiler/emit.gleam, src/arc/compiler/scope.gleam, src/arc/compiler/resolve.gleam
Three-phase bytecode compiler:
Phase 1: Emit
Phase 1: Emit
File:
src/arc/compiler/emit.gleamWalks the AST and produces symbolic IR:- Variable references use string names (
IrScopeGetVar("x")) - Jump targets use label IDs (
IrJump(42)) - Scope markers track variable declarations
EmitterOp (IR + scope metadata)Phase 2: Scope resolution
Phase 2: Scope resolution
File:
src/arc/compiler/scope.gleamResolves symbolic variable names to local slot indices:- Tracks block scopes and function scopes
- Identifies captured variables (closures)
- Boxes captured vars for shared mutation
- Converts
IrScopeGetVar(name)→IrGetLocal(index)orIrGetGlobal(name)
IrOp (no scope markers, local indices assigned)Phase 3: Label resolution
Phase 3: Label resolution
File:
src/arc/compiler/resolve.gleamConverts label IDs to absolute PC addresses:- Two-pass algorithm (collect labels, then resolve)
- Converts
IrJump(label_id)→Jump(pc_address) - Drops
IrLabelmarkers
FuncTemplate (ready for VM execution)VM
Location:src/arc/vm/vm.gleam
Stack-based bytecode interpreter:
- 80+ opcodes covering ES2023+ semantics
- Heap-allocated objects and closures
- Generator and async/await support
- Promise job queue for microtasks
State containing:
- Stack: operand stack for expression evaluation
- Locals: local variable slots (indexed array)
- Heap: garbage-collected object storage
- Call stack: saved frames for function calls
- Try stack: exception handler frames
Built-in objects
Location:src/arc/vm/builtins/*.gleam
Native implementations of JavaScript built-ins:
Object,Array,String,Number,BooleanFunction,Symbol,PromiseMath,JSON,ErrorMap,Set,WeakMap,WeakSetRegExp- Arc namespace (BEAM interop):
Arc.spawn,Arc.send,Arc.receive
Data flow example
Here’s howconst x = 40 + 2 flows through the pipeline:
Key design decisions
Immutable compilation artifacts
All compiler phases produce immutable data structures. The compiler is pure — same AST always produces identical bytecode.Three-phase compilation
Separating emit/scope/resolve allows:- Clean separation of concerns (AST → IR → locals → addresses)
- Easy debugging (inspect IR at each phase)
- Composable optimizations (could insert passes between phases)
Closure capture via boxing
Variables captured by nested closures are boxed: stored in a heap-allocatedBoxSlot instead of directly in locals. Both parent and child hold references to the same box, enabling shared mutable state across closure boundaries (true JavaScript semantics).
Example:
collect_all_captured_vars in compiler.gleam:346) detects x is used by inner, emits BoxLocal(x_index) in outer, and inner receives a capture descriptor pointing to the boxed slot.
Stack machine model
The VM uses a stack machine (no register allocation). This simplifies:- Code generation (no register pressure)
- Expression evaluation (natural postfix form)
- Exception unwinding (stack depth tracking)
BEAM integration
Arc runs on the Erlang VM (BEAM), enabling:- Process spawning:
Arc.spawn(fn)creates a new BEAM process - Message passing:
Arc.send(pid, msg),Arc.receive() - Actor model: JavaScript code can participate in OTP supervision trees
- Distributed computing: Transparent message passing across nodes
- JS primitives ↔ Erlang terms
- JS objects → Erlang maps (with restrictions)
- PIDs are first-class JS values
src/arc/vm/builtins/arc.gleam for the implementation.
Performance characteristics
Arc is a proof-of-concept runtime optimized for correctness and ES2023+ spec compliance, not production performance. It’s designed for educational purposes and exploring JavaScript/BEAM integration.
- No JIT compilation (pure bytecode interpreter)
- Immutable data structures (Gleam’s persistent collections)
- Heap allocation per operation (GC pressure)
- No inline caching for property access
- Bytecode-level peephole optimizations
- Inline caching for hot paths
- Primitive specialization (tagged integers)
- Native function inlining
Next steps
Parser
Recursive descent parsing and Pratt expression handling
Compiler
Three-phase bytecode compilation pipeline
VM
Stack machine execution and heap management
Built-ins
Native JavaScript objects and Arc namespace