Overview
The simulator generates traces in Chrome’s Trace Event Format for visualization in Perfetto. This allows you to see exactly what instructions are executing on each engine slot and how your scratch variables change over time.Trace visualization only works in Chrome. If you encounter issues, you can drag
trace.json directly onto https://ui.perfetto.dev/What Tracing Does
When tracing is enabled, the simulator:- Records every instruction executed on each engine slot (alu, load, store, flow)
- Tracks scratch variable updates with their values over time
- Generates a
trace.jsonfile in Chrome Trace Event Format - Organizes data by core, engine type, and slot number
Generating a Trace
Run the trace test to generate a full-scale trace:forest_height=10, rounds=16, and batch_size=256 — the same parameters used for performance evaluation.
How It Works
The trace is generated by passingtrace=True to the test function:
problem.py:178-204
Hot-Reloading Workflow
Start the trace server
In a separate terminal tab:This starts a local server on port 8000 and opens your browser.
Reading the Trace Output
Process Organization
The trace organizes execution into processes and threads:Core Processes
Each core (0 to N_CORES-1) has its own process showing engine execution
Scratch Processes
Each core has a “Core N Scratch” process showing variable updates
Engine Slots
Within each core process, you’ll see threads for each engine slot:problem.py:48-55
{engine}-{slot_number}, such as:
alu-0throughalu-11(12 scalar ALU slots)load-0andload-1(2 load slots)store-0andstore-1(2 store slots)flow-0(1 flow control slot)
Scratch Variables
In the “Core N Scratch” process, each scratch variable gets its own thread showing when and how it changes:perf_takehome.py:94-109
Understanding the Timeline
The X-axis represents cycle numbers. Each cycle can execute multiple instructions in parallel across different engine slots.
What to Look For
Parallel Execution
Parallel Execution
Instructions in the same cycle on different engine slots execute in parallel. Look for opportunities to pack more operations into each cycle.
Empty Slots
Empty Slots
Gaps in engine slots indicate unused parallelism. Can you move instructions to fill these slots?
Bottlenecks
Bottlenecks
If one engine is constantly full while others are empty, you may have a bottleneck on that engine.
Memory Access Patterns
Memory Access Patterns
Watch load and store slots to understand memory access patterns and potential optimization opportunities.
Trace Format Details
The trace uses Chrome’s Trace Event Format. Key event types:problem.py:151-177
You can extend
trace_post_step() or trace_slot() in problem.py to add custom trace information if needed.Troubleshooting
Common Issues
| Issue | Solution |
|---|---|
| Browser tab opens but shows error | Ensure you’ve run the trace test first to generate trace.json |
| Trace doesn’t refresh | Check that watch_trace.py is still running and re-run the test |
| Can’t see scratch variables | Verify you’re using alloc_scratch() with a name parameter |
| Empty trace | Make sure trace=True is passed to the Machine constructor |
Example: Debugging with Traces
When optimizing, use traces to:- Identify unused cycles — Look for instruction bundles with empty engine slots
- Verify VLIW packing — Confirm multiple operations execute in the same cycle
- Track data flow — Follow a value through operations by watching scratch variables
- Compare implementations — Run traces before and after changes to visualize improvements