Skip to main content
The Tiny TPU uses a fixed-width instruction set architecture (ISA) to control all subsystems including the Unified Buffer, Systolic Array, and Vector Processing Unit.

Instruction format

The instruction bus is 88 bits wide ([87:0]) and divided into fields that directly control TPU subsystems. Each field maps to specific control signals or data values.
The README documentation references a 94-bit instruction format, but the actual hardware implementation in src/control_unit.sv uses 88 bits. This documentation reflects the implemented 88-bit format.

Instruction field layout

Instructions are organized into sequential bit fields, each serving a specific control purpose:
Bit rangeWidthFieldPurpose
[0:4]5 bitsControl signalsSystem-wide 1-bit control flags
[6:5]2 bitsColumn sizeUB read column count
[14:7]8 bitsRow sizeUB read row count (0-255)
[16:15]2 bitsRead addressUB read address pointer
[19:17]3 bitsPointer selectUB data routing selector
[35:20]16 bitsHost data 1First host write data word
[51:36]16 bitsHost data 2Second host write data word
[55:52]4 bitsData pathwayVPU pipeline configuration
[71:56]16 bitsBatch scalingInverse batch size × 2 factor
[87:72]16 bitsLeak factorActivation function leak parameter

Fixed-point representation

Data fields use 16-bit fixed-point format with 8 fractional bits:
  • Format: Q8.8 (8 integer bits, 8 fractional bits)
  • Range: -128.0 to 127.99609375
  • Precision: 1/256 ≈ 0.00390625
This format is used for:
  • Host write data (ub_wr_host_data_in_1, ub_wr_host_data_in_2)
  • Batch size scaling factor (inv_batch_size_times_two_in)
  • Activation leak factor (vpu_leak_factor_in)

Control flow

Instructions are loaded directly into an on-chip instruction buffer. The control unit decodes each instruction and asserts the appropriate signals to coordinate:
  1. Data movement - Reading from and writing to the Unified Buffer
  2. Computation - Configuring the systolic array and VPU pipeline
  3. Synchronization - Managing valid/ready handshakes between modules

Related

See the complete control signal reference for detailed signal descriptions

Instruction execution model

The TPU follows a simple execution model:
  • Instructions execute in program order
  • Most operations complete in one clock cycle
  • Data transfers may take multiple cycles depending on size parameters
  • No instruction-level parallelism or out-of-order execution
For real-world instruction sequences, see the examples page which shows forward and backward pass implementations.

Build docs developers (and LLMs) love