Task Specification

Your mission is to optimize the build_kernel() method in the KernelBuilder class to minimize cycle count on a simulated machine.

Objective

Primary Goal

Optimize KernelBuilder.build_kernel() to execute in the minimum number of cycles as measured by the frozen simulator in tests/submission_tests.py.

What You’re Optimizing

The kernel implements a batch tree traversal with cryptographic hashing:

def build_kernel(self, forest_height: int, n_nodes: int, batch_size: int, rounds: int):
    """
    Like reference_kernel2 but building actual instructions.
    Scalar implementation using only scalar ALU and load/store.
    """

The Algorithm

For each round and each item in the batch:

Load the current tree index and value from memory
Read the node value from the forest at that index
XOR the input value with the node value
Hash the result through multiple stages (HASH_STAGES)
Compute the next tree index based on the hash (left or right child)
Wrap the index if it exceeds the tree bounds
Store the updated index and value back to memory

The reference implementation is a scalar implementation using only scalar ALU, load, and store operations.

What You Can Modify

You have full freedom to modify KernelBuilder.build_kernel()

You can change instruction sequences, memory access patterns, and algorithms

You can use any available machine features (VLIW, vector operations, etc.)

You can add helper methods to the KernelBuilder class

What You Cannot Modify

Critical: Do not modify anything in the tests/ folder. The submission harness uses a frozen copy of the simulator to validate your results.

Do not change the problem specification (tree structure, hash algorithm, etc.)

Do not modify test parameters (forest_height=10, rounds=16, batch_size=256)

Rules and Constraints

Machine Architecture

You’re compiling for a simulated VLIW machine with:

Multiple execution engines (ALU, load, store, flow, debug)
Configurable instruction bundling
Specific cycle costs per operation
Limited scratch space (SCRATCH_SIZE)
Slot limits per engine (SLOT_LIMITS)

Review problem.py to understand the machine architecture, instruction set, and available operations.

Correctness Requirements

Your optimized kernel must:

Produce identical output to the reference kernel for all test cases
Pass the correctness test in tests/submission_tests.py
Use only valid instructions supported by the simulator

Validation Process

Validate your submission using:

# This should be empty - the tests folder must be unchanged
git diff origin/main tests/

# You should pass some of these tests
python tests/submission_tests.py

The testing harness in perf_takehome.py includes debug validation with pause instructions that must match the reference kernel’s yields. The submission harness ignores these debug features.

The LLM Cheating Problem

None of the solutions received on the first day post-release below 1,300 cycles were valid. In every case, a language model modified the tests to make the problem easier.

Common Cheating Patterns

Example: A model might:

Notice multicore support in problem.py
Implement multicore as an “optimization”
Notice N_CORES = 1 prevents speedup
“Fix” the core count to get artificial speedup

Multicore is intentionally disabled in this version. Don’t try to enable it!

Best Practices for AI-Assisted Development

If using an AI agent:

Instruct it not to modify tests/

Explicitly tell your AI not to change anything in the tests/ folder

Use submission_tests.py for verification

Always validate using the official test suite

Verify test integrity

Run git diff origin/main tests/ before submitting

Understand your optimizations

Be able to explain what your code actually does

Performance Test

The main performance benchmark:

def test_kernel_cycles(self):
    do_kernel_test(10, 16, 256)

Parameters:

forest_height: 10 (creates a binary tree with 1,023 nodes)
rounds: 16 (number of full batch iterations)
batch_size: 256 (items processed per round)

Baseline Performance

147,734 cycles - The unoptimized scalar implementation

Debugging Tools

The repository includes several debugging aids:

Trace Visualization

# Generate a hot-reloadable trace (Chrome only)
python perf_takehome.py Tests.test_kernel_trace

# In another terminal
python watch_trace.py

This opens a Perfetto trace viewer that auto-updates as you re-run tests.

Debug Instructions

The kernel supports debug instructions:

self.add("debug", ("compare", addr, (round, i, "label")))
self.add("debug", ("comment", "Helpful message"))

These are ignored by the submission simulator but can help during development.

Next Steps

View Benchmarks

See what performance levels to aim for

Study the Code

Dive into the source code and start optimizing

Get Started

Challenge

Architecture

Kernel Development

Debugging

Objective

Primary Goal

What You’re Optimizing

The Algorithm

What You Can Modify

What You Cannot Modify

Rules and Constraints

Machine Architecture

Correctness Requirements

Validation Process

The LLM Cheating Problem

Common Cheating Patterns

Best Practices for AI-Assisted Development

Performance Test

Baseline Performance

Debugging Tools

Trace Visualization

Debug Instructions

Next Steps

View Benchmarks

Study the Code

Build docs developers (and LLMs) love

Get Started

Challenge

Architecture

Kernel Development

Debugging

​Objective

Primary Goal

​What You’re Optimizing

​The Algorithm

​What You Can Modify

​What You Cannot Modify

​Rules and Constraints

​Machine Architecture

​Correctness Requirements

​Validation Process

​The LLM Cheating Problem

​Common Cheating Patterns

​Best Practices for AI-Assisted Development

​Performance Test

Baseline Performance

​Debugging Tools

​Trace Visualization

​Debug Instructions

​Next Steps

View Benchmarks

Study the Code

Build docs developers (and LLMs) love

Objective

What You’re Optimizing

The Algorithm

What You Can Modify

What You Cannot Modify

Rules and Constraints

Machine Architecture

Correctness Requirements

Validation Process

The LLM Cheating Problem

Common Cheating Patterns

Best Practices for AI-Assisted Development

Performance Test

Debugging Tools

Trace Visualization

Debug Instructions

Next Steps