Skip to main content

Overview

The testing module provides comprehensive utilities for validating kernel correctness and measuring performance. It includes functions for running individual tests, test classes for organized testing, and utilities for building memory images and debugging.

do_kernel_test()

Executes a kernel test with specified parameters and validates correctness against the reference implementation.
forest_height
int
required
Height of the binary tree structure. Determines the number of nodes as 2^(height+1) - 1.
rounds
int
required
Number of rounds to execute the kernel traversal.
batch_size
int
required
Number of parallel tree traversals to process in the batch.
seed
int
default:123
Random seed for reproducible test generation.
trace
bool
default:false
Enable Perfetto trace generation for performance visualization.
prints
bool
default:false
Enable detailed debug output during execution.

Returns

Returns the total cycle count (int) for the kernel execution.

Example

from perf_takehome import do_kernel_test

# Run a basic performance test
cycles = do_kernel_test(
    forest_height=10,
    rounds=16,
    batch_size=256
)
print(f"Completed in {cycles} cycles")

# Run with tracing enabled for debugging
do_kernel_test(
    forest_height=4,
    rounds=8,
    batch_size=64,
    trace=True,
    prints=True
)

Tests Class

Unit test class containing standard test cases for kernel validation.

test_kernel_cycles()

Standard performance test that measures cycle count for a full-scale workload.
import unittest
from perf_takehome import Tests

class MyTests(unittest.TestCase):
    def test_performance(self):
        # Runs: do_kernel_test(10, 16, 256)
        Tests().test_kernel_cycles()
Workload: forest_height=10, rounds=16, batch_size=256

test_kernel_trace()

Generates a Perfetto trace for performance visualization and debugging.
from perf_takehome import Tests

# Generate trace file
Tests().test_kernel_trace()

# View with: python watch_trace.py
Workload: forest_height=10, rounds=16, batch_size=256 with tracing enabled Output: Creates trace.json for visualization in Perfetto UI

build_mem_image()

Constructs a flat memory layout from tree and input data structures.
t
Tree
required
Tree structure containing height and node values.
inp
Input
required
Input batch containing indices, values, and round count.

Returns

Returns a list[int] representing the flattened memory image with the following header structure:
OffsetDescription
0Number of rounds
1Number of nodes
2Batch size
3Tree height
4Pointer to forest values
5Pointer to input indices
6Pointer to input values
7Pointer to extra room

Example

from problem import Tree, Input, build_mem_image
import random

random.seed(123)
forest = Tree.generate(height=8)
inp = Input.generate(forest, batch_size=128, rounds=12)

# Build flat memory layout
mem = build_mem_image(forest, inp)

# Access header values
rounds = mem[0]
n_nodes = mem[1]
batch_size = mem[2]
forest_values_p = mem[4]

Submission Tests

The tests/submission_tests.py module contains the official test suite for validating submissions.

CorrectnessTests

Validates kernel correctness across multiple randomized runs.
from tests.submission_tests import CorrectnessTests

# Run correctness validation
CorrectnessTests().test_kernel_correctness()
Test Coverage:
  • 8 randomized test cases
  • forest_height=10, rounds=16, batch_size=256
  • Validates output values against reference implementation

SpeedTests

Performance benchmarks with progressive difficulty thresholds.
TestThreshold (cycles)Description
test_kernel_speedup()< 147,734Beat baseline
test_kernel_updated_starting_point()< 18,532Modern starter code
test_opus4_many_hours()< 2,164Extended optimization
test_opus45_casual()< 1,790Best human 2-hour
test_opus45_2hr()< 1,579AI 2-hour harness
test_sonnet45_many_hours()< 1,548Extended AI compute
test_opus45_11hr()< 1,48711.5 hour harness
test_opus45_improved_harness()< 1,363Advanced harness
from tests.submission_tests import SpeedTests

# Run performance benchmarks
SpeedTests().test_kernel_speedup()
Baseline: 147,734 cycles. You don’t need to pass all speed tests to succeed - the difficulty curve is non-linear.

DebugInfo Class

Contains debugging metadata for kernel execution analysis.
scratch_map
dict[int, tuple[str, int]]
required
Maps scratch memory addresses to (name, length) pairs for human-readable debugging.

Example

from perf_takehome import KernelBuilder

kb = KernelBuilder()
kb.alloc_scratch("tmp1", 1)
kb.alloc_scratch("tmp2", 1)

# Access debug info
debug_info = kb.debug_info()
print(debug_info.scratch_map)
# Output: {0: ('tmp1', 1), 1: ('tmp2', 1)}

Usage in Machine

The scratch_map enables readable variable names in traces and debug output:
from problem import Machine

machine = Machine(
    mem,
    instructions,
    debug_info,
    trace=True
)

# Debug output shows variable names instead of addresses
print(machine.scratch_map(machine.cores[0]))
# Output: {'tmp1': [42], 'tmp2': [17], ...}

Running Tests

Command Line

# Run all tests
python perf_takehome.py

# Run specific test
python perf_takehome.py Tests.test_kernel_cycles

# Run submission validation
python tests/submission_tests.py

# Generate and view trace
python perf_takehome.py Tests.test_kernel_trace
python watch_trace.py  # In separate terminal

Python API

import unittest
from perf_takehome import Tests
from tests.submission_tests import CorrectnessTests, SpeedTests

# Load and run tests
suite = unittest.TestLoader().loadTestsFromTestCase(Tests)
runner = unittest.TextTestRunner(verbosity=2)
result = runner.run(suite)

# Check specific thresholds
if __name__ == "__main__":
    unittest.main()
Do not modify files in the tests/ directory. The submission system uses a frozen copy of the simulator and reference implementation.

Performance Analysis

Baseline Reference

The baseline kernel implementation achieves 147,734 cycles for the standard workload (forest_height=10, rounds=16, batch_size=256).

Trace Visualization

Generate performance traces for detailed analysis:
  1. Run test with tracing:
do_kernel_test(10, 16, 256, trace=True)
  1. View in Perfetto:
python watch_trace.py
The trace shows:
  • Instruction execution timeline per engine
  • Scratch variable value changes
  • Slot utilization across cycles
  • Critical path analysis

Build docs developers (and LLMs) love