Skip to main content

Type Definitions

Instructions are defined using Python type hints in problem.py:
Engine = Literal["alu", "load", "store", "flow"]
Instruction = dict[Engine, list[tuple]]
From problem.py:14-15 - An instruction is a dictionary mapping engine names to lists of operation tuples.

Instruction Structure

Each instruction is a VLIW (Very Large Instruction Word) bundle that specifies operations for multiple engines:
instruction: Instruction = {
    "engine_name": [
        (operation, operand1, operand2, ...),
        (operation, operand1, operand2, ...),
    ],
    "another_engine": [
        (operation, operand1, operand2, ...),
    ]
}

Key Principles

All operations within an instruction execute in parallel during a single cycle. Each engine processes all its slots simultaneously.
Each engine has a maximum number of operations it can execute per cycle:
  • ALU: 12 slots
  • VALU: 6 slots
  • LOAD: 2 slots
  • STORE: 2 slots
  • FLOW: 1 slot
  • DEBUG: 64 slots (doesn’t count as a cycle)
All writes take effect at the end of the cycle after all reads complete. This means you can safely read and write the same location in one instruction - the read sees the old value.

Operand Format

Critical Rule: Almost every number in an instruction is a scratch address, not a value.

Scratch Addresses

Most operands reference locations in scratch memory:
{
    "alu": [
        ("+", 10, 5, 6)  # scratch[10] = scratch[5] + scratch[6]
    ]
}

Exceptions

OperationExceptionDescription
constSecond operand is immediate("const", dest, 42) - dest gets value 42
add_immThird operand is immediate("add_imm", dest, src, 5) - adds 5 to src
jumpOperand is instruction index("jump", 10) - jumps to instruction 10
cond_jumpSecond operand is instruction index("cond_jump", cond, 10)
cond_jump_relSecond operand is offset("cond_jump_rel", cond, -5) - jumps back 5 instructions
storeFirst operand is indirect("store", addr, src) - stores to mem[scratch[addr]]
From problem.py:86-94 - The machine documentation states: “In general every number in an instruction is a scratch address except for const and jump, and except for store and some flow instructions the first operand is the destination.”

Destination Convention

For most operations, the first operand is the destination:
("+", dest, a, b)      # dest = a + b
("load", dest, addr)   # dest = mem[addr]
("select", dest, c, a, b)  # dest = a if c else b
Exceptions:
("store", addr, src)   # mem[addr] = src (src is second)
("cond_jump", cond, addr)  # Jumps to addr if cond (no destination)

Vector Operations

Vector operations work on 8 contiguous scratch addresses (VLEN=8):
{
    "valu": [
        ("vbroadcast", 100, 50),  # scratch[100:108] = scratch[50]
        ("+", 200, 100, 110)      # scratch[200+i] = scratch[100+i] + scratch[110+i] for i in 0..7
    ]
}
When you specify address 100 for a vector operation, it operates on addresses 100, 101, 102, 103, 104, 105, 106, 107.

Example Instruction Bundles

Simple Arithmetic

# Compute: z = (x + y) * 2
{
    "load": [
        ("const", 2, 2)      # scratch[2] = 2
    ],
    "alu": [
        ("+", 3, 0, 1),      # scratch[3] = scratch[0] + scratch[1]
        ("*", 4, 3, 2)       # scratch[4] = scratch[3] * scratch[2]
    ]
}

Memory Operations

# Load from memory, process, and store back
{
    "load": [
        ("load", 10, 0),     # scratch[10] = mem[scratch[0]]
        ("load", 11, 1)      # scratch[11] = mem[scratch[1]]
    ],
    "alu": [
        ("+", 12, 10, 11)    # scratch[12] = scratch[10] + scratch[11]
    ],
    "store": [
        ("store", 2, 12)     # mem[scratch[2]] = scratch[12]
    ]
}

Vector Processing

# Vector multiply-add: result[i] = a[i] * b[i] + c[i]
{
    "load": [
        ("vload", 100, 50),  # Load 8 words from mem[scratch[50]:scratch[50]+8]
        ("vload", 110, 51)   # Load 8 words from mem[scratch[51]:scratch[51]+8]
    ],
    "valu": [
        ("vbroadcast", 120, 52),           # Broadcast scalar to vector
        ("multiply_add", 200, 100, 110, 120)  # Fused multiply-add
    ]
}

Conditional Execution

# If x < y then z = x else z = y
{
    "alu": [
        ("<", 10, 0, 1)      # scratch[10] = 1 if scratch[0] < scratch[1]
    ],
    "flow": [
        ("select", 2, 10, 0, 1)  # scratch[2] = scratch[0] if scratch[10] else scratch[1]
    ]
}

Loop Control

# Loop: for i in range(n)
# Assuming scratch[0] = i, scratch[1] = n, scratch[2] = 1
{
    "alu": [
        ("+", 0, 0, 2),      # i++
        ("<", 10, 0, 1)      # scratch[10] = (i < n)
    ],
    "flow": [
        ("cond_jump_rel", 10, -3)  # Jump back 3 instructions if i < n
    ]
}

VLIW Packing Strategy

1

Identify Independent Operations

Find operations that don’t depend on each other’s results.
2

Group by Engine

Organize operations by which engine executes them.
3

Pack Within Slot Limits

Combine up to the slot limit for each engine in a single instruction.
4

Handle Dependencies

Operations that depend on previous results must go in a later instruction.

Packing Example

Poor packing (4 cycles):
[
    {"alu": [("*", 10, 0, 1)]},
    {"alu": [("*", 11, 2, 3)]},
    {"alu": [("*", 12, 4, 5)]},
    {"alu": [("*", 13, 6, 7)]}
]
Good packing (1 cycle):
[
    {
        "alu": [
            ("*", 10, 0, 1),
            ("*", 11, 2, 3),
            ("*", 12, 4, 5),
            ("*", 13, 6, 7)
        ]
    }
]

Dependency Example

Must use 2 cycles (second operation depends on first):
[
    {
        "alu": [("*", 10, 0, 1)]  # Compute product
    },
    {
        "alu": [("+", 11, 10, 2)]  # Use product (depends on previous)
    }
]
You cannot pack dependent operations into the same instruction because writes only take effect at the end of the cycle.

Real-World Example from build_kernel

Here’s a typical instruction from the hash computation:
# XOR with constant and compute shifts
{
    "load": [
        ("const", 50, 0xC761C23C)  # Load constant
    ],
    "alu": [
        ("^", 10, 5, 50),          # XOR with constant
        ("<<", 11, 10, 51),        # Left shift
        (">>", 12, 10, 52)         # Right shift (uses same source)
    ],
    "flow": [
        ("select", 13, 11, 12)     # Select based on condition
    ]
}
This example shows VLIW packing: multiple independent operations (load constant, XOR, two shifts) execute in parallel in a single cycle.

Debugging Instructions

Use debug instructions to verify intermediate values during development:
{
    "alu": [("*", 10, 5, 6)],
    "debug": [
        ("compare", 10, "expected_product"),
        ("comment", "Multiply x by y")
    ]
}
Debug instructions don’t consume cycles and can be disabled in production.

Engine Reference

Detailed operation reference for each engine

Architecture Guide

Understanding VLIW and SIMD concepts

Writing Kernels

How to write efficient kernel code

Machine Class

Machine simulator implementation

Build docs developers (and LLMs) love