JIT compilation

Walrus includes an optional Just-In-Time (JIT) compiler powered by Cranelift that compiles hot bytecode regions to native machine code. When enabled, the JIT can provide dramatic speedups for tight loops.

Performance

Benchmark	Interpreter	JIT	Speedup
Sum loop (10K × sum(0..1000))	0.68s	0.01s	68x

Building with JIT support

The JIT compiler requires the jit feature flag:

# Build with JIT enabled
cargo build --release --features jit

# Or add to Cargo.toml
[features]
jit = ["cranelift-codegen", "cranelift-frontend", 
       "cranelift-jit", "cranelift-module", "cranelift-native"]

Running with JIT

Enable JIT compilation at runtime:

# Run with JIT enabled
walrus program.walrus --jit

# Show JIT profiling statistics
walrus program.walrus --jit --jit-stats

# Disable profiling (baseline comparison)
walrus program.walrus --no-jit-profile

Hot-spot detection

The VM profiles execution to identify “hot” code regions suitable for JIT compilation (src/jit/hotspot.rs):

Loop threshold: 1000 iterations
Function threshold: 100 calls
Type stability: Monomorphic operations only

When a loop becomes hot, the JIT compiler analyzes and compiles it to native code.

How detection works

The compiler registers loops during bytecode generation
The VM tracks iteration counts at loop headers
When count exceeds threshold, mark as “hot”
Analyze bytecode for JIT compatibility
Compile to native code if suitable

// From src/jit/hotspot.rs
pub const LOOP_HOT_THRESHOLD: u32 = 1000;
pub const FUNCTION_HOT_THRESHOLD: u32 = 100;

JIT-compatible patterns

The JIT compiler currently supports integer range loops with specific patterns:

Sum accumulation

let sum = 0;
for i in 0..n {
    sum = sum + i;
}

Compiles to optimized native loop with integer addition.

Count iterations

let count = 0;
for i in 0..n {
    count = count + 1;
}

Simple printing

for i in 0..n {
    println(i);
}

Print operations call back to Rust via external functions.

Combined patterns

let sum = 0;
for i in 0..n {
    sum = sum + i;
    println(i);
}

Accumulation and printing can be combined.

Not JIT-compatible

The following patterns fall back to the interpreter:

Function calls

for i in 0..n {
    result = compute(i);  // Contains Call opcode
}

Multiple prints per iteration

for i in 0..n {
    print(i);      // Multiple prints not supported
    print(" ");
}

Complex operations

for i in 0..n {
    list.push(i);  // Method calls not supported
}

String operations

for i in 0..n {
    result = result + "x";  // String concat not supported
}

Compilation architecture

The JIT compilation pipeline (src/jit/compiler.rs):

1. Bytecode analysis

Analyze the loop body to determine the computation pattern:

fn analyze_int_range_loop(
    instructions: &InstructionSet,
    header_ip: usize,
    exit_ip: usize,
) -> JitResult<LoopAnalysis>

Detects:

Accumulator variable
Arithmetic operations (add, subtract, multiply)
Print/println operations
Invalid operations (function calls, etc.)

2. Cranelift IR generation

Translate bytecode to Cranelift intermediate representation:

// Simplified example
let loop_header = builder.create_block();
builder.append_block_param(loop_header, types::I64); // i
builder.append_block_param(loop_header, types::I64); // acc

let cond = builder.ins().icmp(IntCC::SignedLessThan, i, end);
builder.ins().brif(cond, loop_body, &[], loop_exit, &[acc]);

// Loop body: acc = acc + i
let new_acc = builder.ins().iadd(acc, i);
let next_i = builder.ins().iadd(i, one);
builder.ins().jump(loop_header, &[next_i, new_acc]);

3. Native code compilation

Cranelift compiles the IR to machine code optimized for the target CPU:

module.define_function(func_id, &mut ctx)?;
module.finalize_definitions()?;
let func_ptr = module.get_finalized_function(func_id);

4. Execution

The VM calls the JIT-compiled function directly:

let result = unsafe {
    let func: IntRangeAccumFn = mem::transmute(func_ptr);
    func(start, end, initial_acc)
};

Return values are stored back to local variables.

External callbacks

Print operations in JIT-compiled code call back to Rust:

extern "C" fn jit_print_int(value: i64) {
    print!("{}", value);
}

extern "C" fn jit_println_int(value: i64) {
    println!("{}", value);
}

These are declared as external symbols in the JIT module:

builder.symbol("jit_print_int", jit_print_int as *const u8);
builder.symbol("jit_println_int", jit_println_int as *const u8);

Cranelift emits call instructions to these functions.

Type profiling

The VM tracks runtime types at key program points (src/jit/types.rs):

pub struct TypeProfile {
    observations: FxHashMap<usize, TypeFeedback>,
}

pub struct TypeFeedback {
    types: FxHashMap<WalrusType, u32>,  // Type -> count
}

Type stability is checked before compilation:

pub fn is_monomorphic(&self) -> bool {
    self.types.len() == 1  // Only one type observed
}

Polymorphic code (multiple types at same location) is not JIT compiled.

JIT statistics

With --jit-stats, the VM prints compilation statistics:

Hot-spot Statistics:
  Total tracked regions: 3
  Hot loops: 1
  Hot functions: 1
  JIT compiled regions: 1
  Hottest spot: [email protected] (402000x)

Type Profile: 4 locations observed
JIT Stats: 1 functions compiled

Optimizations

The JIT compiler applies several optimizations:

Register allocation

Cranelift’s register allocator keeps loop variables in CPU registers, avoiding memory loads.

Loop unrolling

Cranelift may unroll small loop bodies for better instruction-level parallelism.

Constant folding

Constants in the loop are folded at compile time.

Inlining

External calls to print functions are inlined when beneficial.

Limitations

Current JIT implementation supports:

Integer range loops only (for i in start..end)
Integer arithmetic (add, subtract, multiply)
Integer comparisons
Print/println with integers
Monomorphic types (single type per operation)

Not yet supported:

Iterator-based loops (for x in list)
Floating-point operations
String operations
Function calls
Method calls
Polymorphic loops (mixed types)
Nested loops (outer loop JIT only)

Future enhancements

Polymorphic inlining: Generate specialized code for top-N types
Escape analysis: Allocate short-lived objects on stack
Function inlining: Inline small function calls
SIMD: Use vector instructions for data-parallel operations
Floating-point support: JIT compile float loops
Multi-tier compilation: Quick compile hot code, optimize later

Source references

JIT compiler: src/jit/compiler.rs
Hot-spot detector: src/jit/hotspot.rs
Type profiling: src/jit/types.rs
VM integration: src/vm/mod.rs:595 (try_jit_range_loop)

Get Started

Language Guide

Standard Library

Advanced Topics

CLI Reference

Development

JIT compilation

Performance

Building with JIT support

Running with JIT

Hot-spot detection

How detection works

JIT-compatible patterns

Sum accumulation

Count iterations

Simple printing

Combined patterns

Not JIT-compatible

Function calls

Multiple prints per iteration

Complex operations

String operations

Compilation architecture

1. Bytecode analysis

2. Cranelift IR generation

3. Native code compilation

4. Execution

External callbacks

Type profiling

JIT statistics

Optimizations

Register allocation

Loop unrolling

Constant folding

Inlining

Limitations

Future enhancements

Source references

Build docs developers (and LLMs) love

Get Started

Language Guide

Standard Library

Advanced Topics

CLI Reference

Development

​Performance

​Building with JIT support

​Running with JIT

​Hot-spot detection

​How detection works

​JIT-compatible patterns

​Sum accumulation

​Count iterations

​Simple printing

​Combined patterns

​Not JIT-compatible

​Function calls

​Multiple prints per iteration

​Complex operations

​String operations

​Compilation architecture

​1. Bytecode analysis

​2. Cranelift IR generation

​3. Native code compilation

​4. Execution

​External callbacks

​Type profiling

​JIT statistics

​Optimizations

​Register allocation

​Loop unrolling

​Constant folding

​Inlining

​Limitations

​Future enhancements

​Source references

Build docs developers (and LLMs) love

Performance

Building with JIT support

Running with JIT

Hot-spot detection

How detection works

JIT-compatible patterns

Sum accumulation

Count iterations

Simple printing

Combined patterns

Not JIT-compatible

Function calls

Multiple prints per iteration

Complex operations

String operations

Compilation architecture

1. Bytecode analysis

2. Cranelift IR generation

3. Native code compilation

4. Execution

External callbacks

Type profiling

JIT statistics

Optimizations

Register allocation

Loop unrolling

Constant folding

Inlining

Limitations

Future enhancements

Source references