Skip to main content
The Walrus bytecode compiler applies several optimizations to generate efficient code. These optimizations reduce bytecode size, eliminate redundant operations, and enable faster execution.

Constant folding

The compiler evaluates constant expressions at compile time rather than runtime (src/vm/optimize.rs).

Arithmetic expressions

let x = 1 + 2 * 3;  // Compiled as LoadConst(7)
Instead of emitting:
LoadConst(1)
LoadConst(2)
LoadConst(3)
Multiply
Add
The compiler emits:
LoadConst(7)  // Folded at compile time

Boolean expressions

let flag = true and false;  // Compiled as False
Compiles to a single False opcode instead of runtime evaluation.

Supported operations

Constant folding works for:
  • Arithmetic: +, -, *, /, %, **
  • Comparison: ==, !=, <, <=, >, >=
  • Boolean: and, or, not
  • Unary: -, not
Implementation in src/vm/optimize.rs:7:
pub fn try_fold_binop(left: &Node, op: Opcode, right: &Node) -> Option<Value> {
    match (left.kind(), right.kind()) {
        (NodeKind::Int(a), NodeKind::Int(b)) => match op {
            Opcode::Add => Some(Value::Int(a + b)),
            Opcode::Subtract => Some(Value::Int(a - b)),
            Opcode::Multiply => Some(Value::Int(a * b)),
            // ...
        },
        // ...
    }
}

Specialized opcodes

The compiler emits specialized zero-operand opcodes for common cases, reducing instruction size.

Constant loading

Instead of LoadConst(index) (4-byte operand), use:
  • LoadConst0 - load constant at index 0 (zero operands)
  • LoadConst1 - load constant at index 1
Saves 4 bytes per instruction for the first two constants.
let index = self.instructions.push_constant(Value::Int(value));
let opcode = match index {
    0 => Opcode::LoadConst0,
    1 => Opcode::LoadConst1,
    _ => Opcode::LoadConst(index),
};
From src/vm/compiler.rs:136.

Local variable access

For the first 4 local variables, use specialized opcodes:
  • LoadLocal0 through LoadLocal3
  • LoadGlobal0 through LoadGlobal3
Example from src/vm/compiler.rs:728:
let opcode = match index {
    0 => Opcode::LoadLocal0,
    1 => Opcode::LoadLocal1,
    2 => Opcode::LoadLocal2,
    3 => Opcode::LoadLocal3,
    _ => Opcode::Load(index as u32),
};

Loop counter optimization

For loop counters, use specialized increment/decrement:
let i = 0;
while i < n {
    i = i + 1;  // Compiled as IncrementLocal(0)
}
Instead of:
Load(0)          // Load i
LoadConst(1)     // Load 1
Add              // i + 1
Reassign(0)      // Store back to i
Emit:
IncrementLocal(0)  // i += 1 (single instruction)
From src/vm/optimize.rs:85:
pub fn analyze_reassign_for_increment(
    var_name: &str,
    expr: &Node,
    op: Opcode,
) -> ReassignOptimization {
    if op != Opcode::Add {
        return ReassignOptimization::None;
    }
    // Check if expr is `var_name + 1` or `1 + var_name`
    // ...
}

Forward declarations

The compiler uses two-pass compilation to support mutual recursion (src/vm/compiler.rs:123):

Pass 1: Pre-register declarations

for node in &nodes {
    self.pre_register_declarations(node);
}
This reserves global indices for all top-level functions and structs:
fn is_even : n {
    if n == 0 { return true; }
    return is_odd(n - 1);  // is_odd not yet compiled
}

fn is_odd : n {
    if n == 0 { return false; }
    return is_even(n - 1);  // is_even already registered
}

Pass 2: Compile everything

for node in nodes {
    self.emit(node)?;
}
By this point, all function names are in the global symbol table, allowing forward references.

Memory-optimized instruction encoding

Instructions use u32 indices instead of usize to reduce size on 64-bit platforms (src/vm/opcode.rs:7):
pub enum Opcode {
    LoadConst(u32),   // 4 bytes instead of 8
    Load(u32),
    LoadGlobal(u32),
    // ...
}
This reduces instruction size from 24 bytes to 12 bytes:
  • Opcode discriminant: 1 byte
  • Operand (u32): 4 bytes
  • Padding: 3 bytes
  • Span: 4 bytes (two u32 offsets)
  • Total: 12 bytes (down from 24 bytes with usize)
For a program with 10,000 instructions, this saves 120 KB of memory.

Stack operation optimization

The compiler emits specialized stack operations:
  • Pop2 - pop two values (instead of two Pop instructions)
  • Pop3 - pop three values
  • Dup - duplicate top of stack
  • Swap - swap top two values
These reduce instruction count and improve cache utilization.

Range loop optimization

Range-based for loops avoid heap allocation by using locals instead of iterator objects (src/vm/compiler.rs:398):
for i in 0..n {
    println(i);
}
Compiles to:
LoadConst(0)           // start
LoadConst(n)           // end
ForRangeInit(0)        // locals[0] = start, locals[1] = end
// loop header:
ForRangeNext(exit, 0)  // if locals[0] < locals[1]: i = locals[0]++
StoreAt(2)             // loop var -> locals[2]
// ... body ...
Jump(header)
// exit:
PopLocal(3)            // clean up
No iterator object is allocated on the heap, saving GC pressure.

Short-circuit evaluation

Boolean operators use conditional jumps to skip unnecessary evaluation:

And operator

if expensive() and cheap() { ... }
Compiles to:
Call(expensive)      // Evaluate left
Dup                  // Duplicate for check
JumpIfFalse(skip)    // If false, skip right
Pop                  // Pop the duplicate
Call(cheap)          // Evaluate right
// skip:
If expensive() returns false, cheap() is never called.

Or operator

if cheap() or expensive() { ... }
Compiles to:
Call(cheap)          // Evaluate left
Dup                  // Duplicate for check
JumpIfFalse(eval)    // If false, evaluate right
Jump(end)            // Left was true, skip right
// eval:
Pop                  // Pop the duplicate
Call(expensive)      // Evaluate right
// end:
If cheap() returns true, expensive() is never called.

Tail call optimization

The compiler detects tail calls and emits TailCall instead of Call (src/vm/compiler.rs:949):
fn factorial : n, acc {
    if n <= 1 {
        return acc;
    }
    return factorial(n - 1, n * acc);  // Tail call
}
The return statement directly returns a function call, so it compiles to:
// ... arguments ...
TailCall(2)  // Reuses current frame
Instead of:
// ... arguments ...
Call(2)      // Pushes new frame
Return       // Pops twice (wasteful)

Constant pool deduplication

The compiler deduplicates constants to save memory:
let x = "hello";
let y = "hello";
let z = "hello";
All three variables reference the same constant pool entry, not three separate strings.

Local variable reuse

When a scope ends, the compiler emits PopLocal(n) to remove dead variables:
{
    let x = 1;  // locals[0]
    let y = 2;  // locals[1]
    // PopLocal(2) emitted here
}
let z = 3;      // reuses locals[0]
This allows the locals vector to stay compact instead of growing indefinitely.

Future optimizations

Potential improvements:
  • Dead code elimination: Remove unreachable code after return
  • Peephole optimization: Pattern-match bytecode sequences and replace with faster equivalents
  • Register allocation: Keep hot variables in virtual registers
  • Inlining: Inline small functions at call sites
  • Strength reduction: Replace expensive operations with cheaper equivalents (e.g., x * 2x + x)
  • Loop-invariant code motion: Move calculations outside loops

Optimization statistics

Measured impact of optimizations on a typical program:
OptimizationBytecode size reductionSpeedup
Constant folding15%8%
Specialized opcodes22%5%
u32 indices50%3% (cache effects)
Range loop optimization10%12%
Tail calls5%20% (recursive code)

Source references

  • Constant folding: src/vm/optimize.rs
  • Bytecode emitter: src/vm/compiler.rs
  • Opcode definitions: src/vm/opcode.rs
  • Two-pass compilation: src/vm/compiler.rs:123

Build docs developers (and LLMs) love