Skip to main content
This page documents every control signal in the 88-bit instruction format, organized by functional group.

System control signals

Bits [0:4] contain five 1-bit control flags:
sys_switch_in
bit
default:"0"
System mode switch - controls whether the TPU is actively processing
  • 1 = System active, computation in progress
  • 0 = System idle
Bit position: [0]
ub_rd_start_in
bit
default:"0"
Unified Buffer read transaction trigger
  • 1 = Start a new read transaction
  • 0 = No read initiated
Bit position: [1]
ub_rd_transpose
bit
default:"0"
Unified Buffer read transpose mode
  • 1 = Transpose data during read (for loading transposed weight matrices)
  • 0 = Normal read without transpose
Bit position: [2]
ub_wr_host_valid_in_1
bit
default:"0"
Host write channel 1 valid flag
  • 1 = Data on channel 1 is valid, write to UB
  • 0 = No valid data on channel 1
Bit position: [3]
ub_wr_host_valid_in_2
bit
default:"0"
Host write channel 2 valid flag
  • 1 = Data on channel 2 is valid, write to UB
  • 0 = No valid data on channel 2
Bit position: [4]

Unified Buffer read control

These fields control data reads from the Unified Buffer:
ub_rd_col_size
2-bit
default:"0"
Number of columns to read from UB
ValueColumns
000
011
102
113
Bit position: [6:5]
ub_rd_row_size
8-bit
default:"0"
Number of rows to read from UB (0-255)Specifies how many rows of data to read in the current transaction.Examples:
  • 0x08 = Read 8 rows
  • 0x04 = Read 4 rows (batch size)
  • 0x01 = Read 1 row
Bit position: [14:7]
ub_rd_addr_in
2-bit
default:"0"
Unified Buffer read address pointerSelects the starting address in UB for the read transaction.
The actual implementation uses 2 bits [16:15], providing 4 possible addresses. The README documentation shows 8 bits [22:15], which is a discrepancy with the hardware.
Bit position: [16:15]
ub_ptr_sel
3-bit
default:"0"
Unified Buffer pointer select - routes UB read data to different modules
ValueDestination
000Systolic array (left input)
001Systolic array (top input/weights)
010VPU bias module
011VPU loss module
100VPU activation derivative module
101VPU gradient descent (bias)
110VPU gradient descent (weights)
Example: 3'b001 = route read pointer to weight inputs of systolic arrayBit position: [19:17]

Host write data

The TPU provides two write channels for loading data into the Unified Buffer:
ub_wr_host_data_in_1
16-bit fixed-point
default:"0"
First host write data wordFixed-point format: Q8.8 (8 integer bits, 8 fractional bits)Example: 0xABCD writes the value represented by this fixed-point encodingBit position: [35:20]
ub_wr_host_data_in_2
16-bit fixed-point
default:"0"
Second host write data wordFixed-point format: Q8.8 (8 integer bits, 8 fractional bits)Enables writing two values per instruction cycle for faster data loading.Example: 0x1234Bit position: [51:36]

Vector Processing Unit control

vpu_data_pathway
4-bit
default:"0"
VPU pipeline configuration - selects which modules are active
ValueConfigurationUse case
0000BypassGradient calculation
0001Activation derivative onlyBackpropagation
1100Bias + ActivationForward pass layer 1
1111Bias + Activation + LossForward pass final layer
See VPU data pathways for complete routing details.Bit position: [55:52]
inv_batch_size_times_two_in
16-bit fixed-point
default:"0"
Precomputed scaling factor for MSE loss backpropagationFixed-point format: Q8.8Calculation: 2 / batch_sizeExamples:
  • Batch size 4: 0x0080 (2/4 = 0.5 in Q8.8)
  • Batch size 32: 0x0010 (2/32 = 0.0625 in Q8.8)
Bit position: [71:56]
vpu_leak_factor_in
16-bit fixed-point
default:"0"
Leak factor for Leaky ReLU activation functionFixed-point format: Q8.8Common values:
  • 0x0080 = 0.5 (typical for Leaky ReLU)
  • 0x0019 = 0.1 (common alternative)
  • 0x0000 = 0.0 (standard ReLU)
Example: 0x00A0 = 0.625 (160/256 in Q8.8)Bit position: [87:72]

Signal timing

Control signals follow these timing conventions:
  • Start signals (ub_rd_start_in): Assert for one cycle to initiate operation
  • Valid signals (ub_wr_host_valid_in_*): Hold high while data is valid
  • Mode signals (ub_rd_transpose, sys_switch_in): Set before starting operation
  • Data signals: Must be stable when corresponding valid signal is high
In the test sequences, start signals are typically asserted for one cycle, then cleared while the operation completes.

Build docs developers (and LLMs) love