Skip to main content

Benchmark suite

The benchmarks/ directory contains a comprehensive benchmark suite comparing Python and Walrus performance across various computational patterns.

Running benchmarks

Run all benchmarks

1

Navigate to benchmarks directory

cd benchmarks
2

Execute the benchmark script

The script will automatically build Walrus in release mode if needed:
./run_benchmarks.sh
3

Review results

The script outputs timing comparisons for each benchmark, showing which language is faster and by what factor.

Run individual benchmarks

You can run specific benchmarks separately:
python3 benchmarks/01_quicksort.py

Benchmark categories

Sorting algorithms

BenchmarkDescriptionScale
01_quicksortQuicksort algorithm5,000 elements
02_bubble_sortO(n²) bubble sort1,000 elements

Recursion tests

BenchmarkDescriptionScale
03_fibonacci_recursiveNaive recursive Fibonaccifib(30) - exponential time
04_fibonacci_iterativeIterative Fibonacci100,000 iterations
13_factorialRecursive factorialfactorial(12) - 100k calls
15_ackermannAckermann functionA(3,7) - extreme recursion

Memory and GC stress

BenchmarkDescriptionScale
05_gc_stressObject allocation/deallocation50,000 temporary objects
06_gc_linked_structuresLinked list creation1,000 lists × depth 100
16_tree_traversalBinary tree operationsdepth 15

Iteration performance

BenchmarkDescriptionScale
07_iteration_heavySimple counting loop1,000,000 iterations
08_nested_loopsTriple nested loops100³ = 1M iterations
17_while_loopWhile loop counting1,000,000 iterations

Data structures

BenchmarkDescriptionScale
09_list_operationsList append, access, iteration10,000 elements
10_dict_operationsDictionary insert and lookup10,000 entries
11_string_concatString concatenation10,000 characters

Algorithms

BenchmarkDescriptionScale
12_prime_sieveSieve of Eratosthenesup to 50,000
14_matrix_multiplyMatrix multiplication50×50 matrices

Language features

BenchmarkDescriptionScale
18_closure_stressFunction call overhead50,000 calls
19_arithmetic_heavyHeavy arithmetic operations1,000,000 iterations
20_method_dispatchStruct/class method calls100,000 iterations

Interpreting results

Lower execution times are better
The benchmark script displays:
  • Execution time for each language
  • Comparative speedup factor
  • Color-coded results (green for Walrus wins, red for Python wins)
Sample output:
==========================================
Benchmark: 01_quicksort
==========================================

--- Python ---
Time: 0.45s

--- Walrus ---
Time: 0.32s

Walrus is 0.71x faster

Factors affecting results

Performance results may vary based on several factors:
  • Python version - Python 3.11+ includes JIT optimizations that significantly improve performance
  • System hardware - CPU, RAM, and system load affect benchmarks
  • Compilation options - Ensure Walrus is built with --release flag
  • Startup time - Both interpreters include initialization overhead
  • Compilation time - Walrus parsing and bytecode generation is included

JIT benchmark comparison

For JIT-enabled builds, you can compare interpreter vs JIT performance:
# Build with JIT support
cargo build --release --features jit

# Run without JIT
./target/release/walrus benchmarks/07_iteration_heavy.walrus

# Run with JIT enabled
./target/release/walrus benchmarks/07_iteration_heavy.walrus --jit

# Show JIT statistics
./target/release/walrus benchmarks/07_iteration_heavy.walrus --jit --jit-stats
Expected JIT speedup (hot loops):
PatternInterpreterJITSpeedup
10K iterations × sum(0..1000)0.68s0.01s~68×

Notes on fairness

  • Both languages run equivalent algorithms for fair comparison
  • Syntax differences are minimal (e.g., 0..n vs range(n))
  • Memory usage is not directly measured, only GC stress patterns
  • Python startup time and Walrus compilation time are both included

Syntax differences

FeaturePythonWalrus
Range loopfor i in range(n):for i in 0..n {
Function defdef foo(x):fn foo : x {
None/nullNonevoid
Printprint(x)println(x)
F-stringf"x={x}"f"x={x}"

Contributing benchmarks

When adding new benchmarks:
  1. Create both .walrus and .py files with the same basename
  2. Ensure algorithms are equivalent
  3. Include appropriate scale for meaningful timing
  4. Add descriptions to the benchmark README
  5. Test that both implementations produce correct results
See Contributing for more details.

Build docs developers (and LLMs) love