Instruction format
The instruction bus is 88 bits wide ([87:0]) and divided into fields that directly control TPU subsystems. Each field maps to specific control signals or data values.
The README documentation references a 94-bit instruction format, but the actual hardware implementation in
src/control_unit.sv uses 88 bits. This documentation reflects the implemented 88-bit format.Instruction field layout
Instructions are organized into sequential bit fields, each serving a specific control purpose:| Bit range | Width | Field | Purpose |
|---|---|---|---|
[0:4] | 5 bits | Control signals | System-wide 1-bit control flags |
[6:5] | 2 bits | Column size | UB read column count |
[14:7] | 8 bits | Row size | UB read row count (0-255) |
[16:15] | 2 bits | Read address | UB read address pointer |
[19:17] | 3 bits | Pointer select | UB data routing selector |
[35:20] | 16 bits | Host data 1 | First host write data word |
[51:36] | 16 bits | Host data 2 | Second host write data word |
[55:52] | 4 bits | Data pathway | VPU pipeline configuration |
[71:56] | 16 bits | Batch scaling | Inverse batch size × 2 factor |
[87:72] | 16 bits | Leak factor | Activation function leak parameter |
Fixed-point representation
Data fields use 16-bit fixed-point format with 8 fractional bits:- Format: Q8.8 (8 integer bits, 8 fractional bits)
- Range: -128.0 to 127.99609375
- Precision: 1/256 ≈ 0.00390625
- Host write data (
ub_wr_host_data_in_1,ub_wr_host_data_in_2) - Batch size scaling factor (
inv_batch_size_times_two_in) - Activation leak factor (
vpu_leak_factor_in)
Control flow
Instructions are loaded directly into an on-chip instruction buffer. The control unit decodes each instruction and asserts the appropriate signals to coordinate:- Data movement - Reading from and writing to the Unified Buffer
- Computation - Configuring the systolic array and VPU pipeline
- Synchronization - Managing valid/ready handshakes between modules
Related
See the complete control signal reference for detailed signal descriptions
Instruction execution model
The TPU follows a simple execution model:- Instructions execute in program order
- Most operations complete in one clock cycle
- Data transfers may take multiple cycles depending on size parameters
- No instruction-level parallelism or out-of-order execution