Trace Visualization

Overview

The simulator generates traces in Chrome’s Trace Event Format for visualization in Perfetto. This allows you to see exactly what instructions are executing on each engine slot and how your scratch variables change over time.

Trace visualization only works in Chrome. If you encounter issues, you can drag trace.json directly onto https://ui.perfetto.dev/

What Tracing Does

When tracing is enabled, the simulator:

Records every instruction executed on each engine slot (alu, load, store, flow)
Tracks scratch variable updates with their values over time
Generates a trace.json file in Chrome Trace Event Format
Organizes data by core, engine type, and slot number

Generating a Trace

Run the trace test to generate a full-scale trace:

python perf_takehome.py Tests.test_kernel_trace

This test runs with forest_height=10, rounds=16, and batch_size=256 — the same parameters used for performance evaluation.

How It Works

The trace is generated by passing trace=True to the test function:

problem.py:178-204

value_trace = {}
machine = Machine(
    mem,
    kb.instrs,
    kb.debug_info(),
    n_cores=N_CORES,
    value_trace=value_trace,
    trace=True,  # Enables trace generation
)

Hot-Reloading Workflow

Run the trace test

python perf_takehome.py Tests.test_kernel_trace

This generates trace.json in your working directory.

Start the trace server

In a separate terminal tab:

python watch_trace.py

This starts a local server on port 8000 and opens your browser.

Open Perfetto

Click “Open Perfetto” in the browser tab that opens.

Make changes and re-run

Keep the browser tab open
Modify your kernel in perf_takehome.py
Re-run the trace test
The trace view automatically refreshes with your new trace

The hot-reloading workflow lets you iterate quickly without manually loading trace files each time.

Reading the Trace Output

Process Organization

The trace organizes execution into processes and threads:

Core Processes

Each core (0 to N_CORES-1) has its own process showing engine execution

Scratch Processes

Each core has a “Core N Scratch” process showing variable updates

Engine Slots

Within each core process, you’ll see threads for each engine slot:

problem.py:48-55

SLOT_LIMITS = {
    "alu": 12,
    "valu": 6,
    "load": 2,
    "store": 2,
    "flow": 1,
    "debug": 64,
}

Each slot is labeled as {engine}-{slot_number}, such as:

alu-0 through alu-11 (12 scalar ALU slots)
load-0 and load-1 (2 load slots)
store-0 and store-1 (2 store slots)
flow-0 (1 flow control slot)

Scratch Variables

In the “Core N Scratch” process, each scratch variable gets its own thread showing when and how it changes:

perf_takehome.py:94-109

tmp1 = self.alloc_scratch("tmp1")
tmp2 = self.alloc_scratch("tmp2")
tmp3 = self.alloc_scratch("tmp3")
init_vars = [
    "rounds",
    "n_nodes",
    "batch_size",
    "forest_height",
    "forest_values_p",
    "inp_indices_p",
    "inp_values_p",
]

Each event shows the variable’s value at that cycle.

Understanding the Timeline

The X-axis represents cycle numbers. Each cycle can execute multiple instructions in parallel across different engine slots.

What to Look For

Parallel Execution

Instructions in the same cycle on different engine slots execute in parallel. Look for opportunities to pack more operations into each cycle.

Empty Slots

Gaps in engine slots indicate unused parallelism. Can you move instructions to fill these slots?

Bottlenecks

If one engine is constantly full while others are empty, you may have a bottleneck on that engine.

Memory Access Patterns

Watch load and store slots to understand memory access patterns and potential optimization opportunities.

Trace Format Details

The trace uses Chrome’s Trace Event Format. Key event types:

problem.py:151-177

def setup_trace(self):
    self.trace = open("trace.json", "w")
    self.trace.write("[")
    tid_counter = 0
    self.tids = {}
    for ci, core in enumerate(self.cores):
        self.trace.write(
            f'{{"name": "process_name", "ph": "M", "pid": {ci}, "tid": 0, "args": {{"name":"Core {ci}"}}}},' + '\n'
        )
        for name, limit in SLOT_LIMITS.items():
            if name == "debug":
                continue
            for i in range(limit):
                tid_counter += 1
                self.trace.write(
                    f'{{"name": "thread_name", "ph": "M", "pid": {ci}, "tid": {tid_counter}, "args": {{"name":"{name}-{i}"}}}},' + '\n'
                )
                self.tids[(ci, name, i)] = tid_counter

You can extend trace_post_step() or trace_slot() in problem.py to add custom trace information if needed.

Troubleshooting

If the browser tab doesn’t open automatically, manually navigate to http://localhost:8000

Common Issues

Issue	Solution
Browser tab opens but shows error	Ensure you’ve run the trace test first to generate `trace.json`
Trace doesn’t refresh	Check that `watch_trace.py` is still running and re-run the test
Can’t see scratch variables	Verify you’re using `alloc_scratch()` with a name parameter
Empty trace	Make sure `trace=True` is passed to the Machine constructor

Example: Debugging with Traces

When optimizing, use traces to:

Identify unused cycles — Look for instruction bundles with empty engine slots
Verify VLIW packing — Confirm multiple operations execute in the same cycle
Track data flow — Follow a value through operations by watching scratch variables
Compare implementations — Run traces before and after changes to visualize improvements

The trace viewer’s search and filter features are invaluable for focusing on specific variables or instruction types.

Get Started

Challenge

Architecture

Kernel Development

Debugging

Overview

What Tracing Does

Generating a Trace

How It Works

Hot-Reloading Workflow

Reading the Trace Output

Process Organization

Core Processes

Scratch Processes

Engine Slots

Scratch Variables

Understanding the Timeline

What to Look For

Trace Format Details

Troubleshooting

Common Issues

Example: Debugging with Traces

Build docs developers (and LLMs) love

Get Started

Challenge

Architecture

Kernel Development

Debugging

​Overview

​What Tracing Does

​Generating a Trace

​How It Works

​Hot-Reloading Workflow

​Reading the Trace Output

​Process Organization

Core Processes

Scratch Processes

​Engine Slots

​Scratch Variables

​Understanding the Timeline

​What to Look For

​Trace Format Details

​Troubleshooting

​Common Issues

​Example: Debugging with Traces

Build docs developers (and LLMs) love

Overview

What Tracing Does

Generating a Trace

How It Works

Hot-Reloading Workflow

Reading the Trace Output

Process Organization

Engine Slots

Scratch Variables

Understanding the Timeline

What to Look For

Trace Format Details

Troubleshooting

Common Issues

Example: Debugging with Traces