Compiler Internals

The Elara compiler transforms source code through 9 distinct phases, each with its own AST representation. This multi-pass architecture enables clean separation of concerns and facilitates optimization.

Compilation Pipeline Overview

Lexing

Convert source text to tokens, apply layout rules

Parsing

Build Frontend AST from tokens

Desugaring

Simplify language constructs (multi-arg lambdas → nested lambdas)

Renaming

Resolve names, make all names unique

Shunting

Apply operator precedence and associativity

Type Checking

Infer and verify types, produce Typed AST

ToCore

Convert to Core IR (typed lambda calculus)

CoreToCore

Apply optimizations on Core IR

Emitting

Generate JVM bytecode

Phase 1: Lexing

Location: src/Elara/Lexer/ The lexer converts source text into a stream of tokens and applies lightweight syntax rules:

Layout Rules

Elara uses Haskell-style layout (indentation-sensitive syntax):

Implicit Braces

let x =     -- '{' inserted after '='
    1       -- At this column
    + 2     -- Same indentation, part of x
let y = 3   -- '}' inserted, dedent detected

Implicit Semicolons

let x = 1   -- ';' inserted at newline
let y = 2   -- ';' inserted

Offside Rule

The lexer tracks offside columns where code must be aligned:

module Main       -- Marks offside column for top-level
let x = 1         -- OK: aligned with 'let'
  let y = 2       -- Error: indented too far
let z = 3         -- OK: aligned

The lexer uses a layout stack to track nested offside contexts (module, let, match, etc.).

Phase 2: Parsing

Location: src/Elara/Parser/ The parser builds the Frontend AST, which closely mirrors source syntax:

data Expr' front
  = Int Integer
  | String Text
  | Var VarRef
  | Lambda (LambdaParam front) (Expr front)
  | FunctionCall (Expr front) (Expr front)
  | Let VarName (TypeAnnotation front) (Expr front)
  | LetIn VarName (TypeAnnotation front) (Expr front) (Expr front)
  | If (Expr front) (Expr front) (Expr front)
  | Match (Expr front) [(Pattern front, Expr front)]
  | Block [Expr front]
  | ...

Key features:

Operator parsing: Infix operators like +, * are parsed as binary expressions
Pattern syntax: Full support for constructor, list, tuple, and wildcard patterns
Type annotations: Optional type signatures on let bindings

Parsing does NOT resolve names or check types - it only verifies syntactic correctness.

Phase 3: Desugaring

Location: src/Elara/Desugar/ Desugaring simplifies the AST while preserving semantics:

Multi-Argument Lambdas

-- Before
\x y z -> body

-- After
\x -> \y -> \z -> body

Let Bindings with Parameters

-- Before
let add x y = x + y

-- After  
let add = \x -> \y -> x + y

Multiple Declarations

Merge all def and let declarations with the same name:

def add : Int -> Int -> Int
let add x y = x + y

-- Merged into single declaration with type annotation

The Desugared AST is still fairly high-level - complex desugarings (like pattern matching) happen later in ToCore.

Phase 4: Renaming

Location: src/Elara/Rename/ The renamer performs name resolution and makes all names unique:

Name Resolution

Module Resolution

Resolve imports, build module dependency graph

Scope Analysis

Track which names are in scope at each point

Qualification

Convert unqualified names to fully qualified names (e.g., map → Prelude.map)

Uniquification

Assign unique IDs to local variables to avoid shadowing issues

Example

import Data.List (map)

let map = \x -> x * 2  -- Local definition shadows import
let result = map [1, 2, 3]

After renaming:

import Data.List (map)

let map_42 = \x -> x * 2  -- Renamed to map_42
let result = map_42 [1, 2, 3]  -- Uses local map_42, not Data.List.map

The Renamed AST uses VarRef to distinguish global references (qualified names) from local references (unique IDs).

Phase 5: Shunting

Location: src/Elara/Shunt/ The shunter applies operator precedence and associativity using the Shunting Yard algorithm:

-- Before (flat binary expressions)
1 + 2 * 3

-- After (correct precedence)
1 + (2 * 3)

Operator precedence is derived from @infix annotations:

@infix(left, 6)
def (+) : Int -> Int -> Int

@infix(left, 7)  
def (*) : Int -> Int -> Int

Operators without precedence annotations default to precedence 9 and emit a warning.

Phase 6: Type Checking

Location: src/Elara/TypeInfer/ The type checker infers types using the Hindley-Milner algorithm:

Constraint Generation: Walk the AST, generate type equality constraints
Constraint Solving: Unify types to find a substitution
Substitution Application: Apply the substitution to produce the Typed AST

See Type Inference for detailed explanation. The output is a Typed AST where every expression is annotated with its type:

data Expr' Typed = ...
type TypedExpr = Expr (Located (Expr' Typed, Monotype SourceRegion))

Phase 7: ToCore Translation

Location: src/Elara/Core/ The ToCore pass converts the high-level Typed AST to Core, a minimal typed lambda calculus:

data Expr bind
  = Var (Var bind)
  | Lam (Var bind) (Expr bind)
  | App (Expr bind) (Expr bind)
  | Let (CoreBind bind) (Expr bind)
  | Case (Expr bind) [Alt bind]
  | Lit Literal
  | Type Type

data CoreBind bind
  = NonRec (Var bind) (Expr bind)
  | Rec [(Var bind, Expr bind)]

Major Transformations

Pattern Match Compilation

Pattern matching is desugared to case expressions with simple patterns:

match list with
  [] -> 0
  x::xs -> x + sum xs

Becomes:

case list of
  Nil -> 0
  Cons x xs -> 
    let sum_xs = sum xs
    in x + sum_xs

Recursive Let Detection

Identify and mark recursive bindings:

let fact n = if n == 0 then 1 else n * fact (n - 1)

Becomes:

letrec fact = \n -> 
  case n == 0 of
    True -> 1
    False -> n * fact (n - 1)

Type Applications

Make implicit type applications explicit:

id 42  -- id : forall a. a -> a

Becomes:

id @Int 42  -- Explicit type application

Core is similar to GHC’s Core language and is designed for optimization.

Phase 8: Core-to-Core Passes

Location: src/Elara/Core/ Multiple optimization passes transform Core:

ANF Conversion

Convert to A-Normal Form (all subexpressions are atomic)

Closure Lifting

Lift nested functions to top-level, generate closure data structures

Dead Code Elimination

Remove unused bindings

Inlining

Inline small functions at call sites (future work)

A-Normal Form (ANF)

ANF ensures all intermediate values are named:

-- Before
f (g x) (h y)

-- After ANF
let a = g x
let b = h y
in f a b

This simplifies subsequent passes like closure conversion.

Closure Lifting

Nested functions are lifted to the top level with explicit environment parameters:

let makeAdder = \x -> \y -> x + y

Becomes:

-- Top-level function with environment parameter
let makeAdder_inner env y = 
  let x = env.x in x + y

-- Original function creates closure
let makeAdder x =
  let env = { x = x }
  in makeAdder_inner env

Phase 9: JVM Emission

Location: src/Elara/JVM/ The final phase generates JVM bytecode:

Lower to JVM IR

Convert Core to high-level JVM IR (src/Elara/JVM/IR.hs)

Emit Bytecode

Translate IR to actual JVM instructions (src/Elara/JVM/Emit.hs)

Generate Class Files

Write .class files with proper constant pools and attributes

See JVM Interop for detailed bytecode generation.

Query System

Elara uses the Rock query system for incremental compilation:

data Query a where
  ParseModule :: ModuleName -> Query (Frontend.Module)
  TypeCheck :: ModuleName -> Query (Typed.Module)
  GenerateCore :: ModuleName -> Query (Core.Module)
  EmitBytecode :: ModuleName -> Query ClassFile

Queries are:

Memoized: Results are cached
Lazy: Only computed when needed
Incremental: Dependencies are tracked automatically

The query system enables features like “compile on demand” where only required modules are processed.

Debugging and Inspection

The compiler provides several debug flags:

elara compile --dump-lexed    # Show lexer output
elara compile --dump-parsed   # Show Frontend AST
elara compile --dump-renamed  # Show Renamed AST
elara compile --dump-typed    # Show Typed AST with inferred types
elara compile --dump-core     # Show Core IR
elara compile --dump-core-anf # Show Core after ANF conversion

Logs are written to .elara/logs/ for inspection.

Advanced Topics

Community

Compiler Internals

Compiler Internals

Compilation Pipeline Overview

Phase 1: Lexing

Layout Rules

Offside Rule

Phase 2: Parsing

Phase 3: Desugaring

Phase 4: Renaming

Name Resolution

Example

Phase 5: Shunting

Phase 6: Type Checking

Phase 7: ToCore Translation

Major Transformations

Phase 8: Core-to-Core Passes

A-Normal Form (ANF)

Closure Lifting

Phase 9: JVM Emission

Query System

Debugging and Inspection

Further Reading

Build docs developers (and LLMs) love

Advanced Topics

Community

​Compiler Internals

​Compilation Pipeline Overview

​Phase 1: Lexing

​Layout Rules

​Offside Rule

​Phase 2: Parsing

​Phase 3: Desugaring

​Phase 4: Renaming

​Name Resolution

​Example

​Phase 5: Shunting

​Phase 6: Type Checking

​Phase 7: ToCore Translation

​Major Transformations

​Phase 8: Core-to-Core Passes

​A-Normal Form (ANF)

​Closure Lifting

​Phase 9: JVM Emission

​Query System

​Debugging and Inspection

​Further Reading

Build docs developers (and LLMs) love

Compiler Internals

Compilation Pipeline Overview

Phase 1: Lexing

Layout Rules

Offside Rule

Phase 2: Parsing

Phase 3: Desugaring

Phase 4: Renaming

Name Resolution

Example

Phase 5: Shunting

Phase 6: Type Checking

Phase 7: ToCore Translation

Major Transformations

Phase 8: Core-to-Core Passes

A-Normal Form (ANF)

Closure Lifting

Phase 9: JVM Emission

Query System

Debugging and Inspection

Further Reading