Overview
The testing module provides comprehensive utilities for validating kernel correctness and measuring performance. It includes functions for running individual tests, test classes for organized testing, and utilities for building memory images and debugging.do_kernel_test()
Executes a kernel test with specified parameters and validates correctness against the reference implementation.Height of the binary tree structure. Determines the number of nodes as
2^(height+1) - 1.Number of rounds to execute the kernel traversal.
Number of parallel tree traversals to process in the batch.
Random seed for reproducible test generation.
Enable Perfetto trace generation for performance visualization.
Enable detailed debug output during execution.
Returns
Returns the total cycle count (int) for the kernel execution.
Example
Tests Class
Unit test class containing standard test cases for kernel validation.test_kernel_cycles()
Standard performance test that measures cycle count for a full-scale workload.forest_height=10, rounds=16, batch_size=256
test_kernel_trace()
Generates a Perfetto trace for performance visualization and debugging.forest_height=10, rounds=16, batch_size=256 with tracing enabled
Output: Creates trace.json for visualization in Perfetto UI
build_mem_image()
Constructs a flat memory layout from tree and input data structures.Tree structure containing height and node values.
Input batch containing indices, values, and round count.
Returns
Returns alist[int] representing the flattened memory image with the following header structure:
| Offset | Description |
|---|---|
| 0 | Number of rounds |
| 1 | Number of nodes |
| 2 | Batch size |
| 3 | Tree height |
| 4 | Pointer to forest values |
| 5 | Pointer to input indices |
| 6 | Pointer to input values |
| 7 | Pointer to extra room |
Example
Submission Tests
Thetests/submission_tests.py module contains the official test suite for validating submissions.
CorrectnessTests
Validates kernel correctness across multiple randomized runs.- 8 randomized test cases
forest_height=10,rounds=16,batch_size=256- Validates output values against reference implementation
SpeedTests
Performance benchmarks with progressive difficulty thresholds.| Test | Threshold (cycles) | Description |
|---|---|---|
test_kernel_speedup() | < 147,734 | Beat baseline |
test_kernel_updated_starting_point() | < 18,532 | Modern starter code |
test_opus4_many_hours() | < 2,164 | Extended optimization |
test_opus45_casual() | < 1,790 | Best human 2-hour |
test_opus45_2hr() | < 1,579 | AI 2-hour harness |
test_sonnet45_many_hours() | < 1,548 | Extended AI compute |
test_opus45_11hr() | < 1,487 | 11.5 hour harness |
test_opus45_improved_harness() | < 1,363 | Advanced harness |
Baseline: 147,734 cycles. You don’t need to pass all speed tests to succeed - the difficulty curve is non-linear.
DebugInfo Class
Contains debugging metadata for kernel execution analysis.Maps scratch memory addresses to (name, length) pairs for human-readable debugging.
Example
Usage in Machine
Thescratch_map enables readable variable names in traces and debug output:
Running Tests
Command Line
Python API
Performance Analysis
Baseline Reference
The baseline kernel implementation achieves 147,734 cycles for the standard workload (forest_height=10, rounds=16, batch_size=256).
Trace Visualization
Generate performance traces for detailed analysis:- Run test with tracing:
- View in Perfetto:
- Instruction execution timeline per engine
- Scratch variable value changes
- Slot utilization across cycles
- Critical path analysis