System control signals
Bits[0:4] contain five 1-bit control flags:
System mode switch - controls whether the TPU is actively processing
1= System active, computation in progress0= System idle
[0]Unified Buffer read transaction trigger
1= Start a new read transaction0= No read initiated
[1]Unified Buffer read transpose mode
1= Transpose data during read (for loading transposed weight matrices)0= Normal read without transpose
[2]Host write channel 1 valid flag
1= Data on channel 1 is valid, write to UB0= No valid data on channel 1
[3]Host write channel 2 valid flag
1= Data on channel 2 is valid, write to UB0= No valid data on channel 2
[4]Unified Buffer read control
These fields control data reads from the Unified Buffer:Number of columns to read from UB
Bit position:
| Value | Columns |
|---|---|
00 | 0 |
01 | 1 |
10 | 2 |
11 | 3 |
[6:5]Number of rows to read from UB (0-255)Specifies how many rows of data to read in the current transaction.Examples:
0x08= Read 8 rows0x04= Read 4 rows (batch size)0x01= Read 1 row
[14:7]Unified Buffer read address pointerSelects the starting address in UB for the read transaction.Bit position:
The actual implementation uses 2 bits
[16:15], providing 4 possible addresses. The README documentation shows 8 bits [22:15], which is a discrepancy with the hardware.[16:15]Unified Buffer pointer select - routes UB read data to different modules
Example:
| Value | Destination |
|---|---|
000 | Systolic array (left input) |
001 | Systolic array (top input/weights) |
010 | VPU bias module |
011 | VPU loss module |
100 | VPU activation derivative module |
101 | VPU gradient descent (bias) |
110 | VPU gradient descent (weights) |
3'b001 = route read pointer to weight inputs of systolic arrayBit position: [19:17]Host write data
The TPU provides two write channels for loading data into the Unified Buffer:First host write data wordFixed-point format: Q8.8 (8 integer bits, 8 fractional bits)Example:
0xABCD writes the value represented by this fixed-point encodingBit position: [35:20]Second host write data wordFixed-point format: Q8.8 (8 integer bits, 8 fractional bits)Enables writing two values per instruction cycle for faster data loading.Example:
0x1234Bit position: [51:36]Vector Processing Unit control
VPU pipeline configuration - selects which modules are active
See VPU data pathways for complete routing details.Bit position:
| Value | Configuration | Use case |
|---|---|---|
0000 | Bypass | Gradient calculation |
0001 | Activation derivative only | Backpropagation |
1100 | Bias + Activation | Forward pass layer 1 |
1111 | Bias + Activation + Loss | Forward pass final layer |
[55:52]Precomputed scaling factor for MSE loss backpropagationFixed-point format: Q8.8Calculation:
2 / batch_sizeExamples:- Batch size 4:
0x0080(2/4 = 0.5 in Q8.8) - Batch size 32:
0x0010(2/32 = 0.0625 in Q8.8)
[71:56]Leak factor for Leaky ReLU activation functionFixed-point format: Q8.8Common values:
0x0080= 0.5 (typical for Leaky ReLU)0x0019= 0.1 (common alternative)0x0000= 0.0 (standard ReLU)
0x00A0 = 0.625 (160/256 in Q8.8)Bit position: [87:72]Signal timing
Control signals follow these timing conventions:- Start signals (
ub_rd_start_in): Assert for one cycle to initiate operation - Valid signals (
ub_wr_host_valid_in_*): Hold high while data is valid - Mode signals (
ub_rd_transpose,sys_switch_in): Set before starting operation - Data signals: Must be stable when corresponding valid signal is high