Architecture
The bias module consists of a parent-child hierarchy:- bias_parent: Top-level module instantiating two bias_child modules for parallel column processing
- bias_child: Individual processing unit handling bias addition for one feature column
Module ports
bias_parent
System clock signal
Active-high reset signal
Bias scalar for column 1, fetched from unified buffer
Bias scalar for column 2, fetched from unified buffer
Data input from systolic array for column 1
Data input from systolic array for column 2
Valid signal for column 1 data from systolic array
Valid signal for column 2 data from systolic array
Pre-activation output (Z) for column 1
Pre-activation output (Z) for column 2
Valid signal for column 1 output
Valid signal for column 2 output
bias_child
System clock signal
Active-high reset signal
Bias scalar value from unified buffer
Data from systolic array
Valid signal from systolic array
Pre-activation output after bias addition
Output valid signal
Operation
The bias module performs fixed-point addition: Z = X·W + b Where:- X·W is the matrix multiplication result from the systolic array
- b is the bias term stored in the unified buffer
- Z is the pre-activation output
Pipeline stages
- Combinational addition: The
fxp_addmodule performs fixed-point addition of systolic array output and bias scalar - Registered output: On the next clock cycle, if the input valid signal is high, the result is registered and the output valid signal is asserted
Fixed-point arithmetic
The bias module uses 16-bit signed fixed-point representation (Q8.8 format: 8 integer bits, 8 fractional bits). Thefxp_add module handles:
- Proper alignment of binary points
- Overflow detection
- Rounding according to configured parameters
https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/src/fixedpoint.sv:110 for the fxp_add implementation.
Integration with VPU
The bias module is activated during the VPU’s forward pass pathway. The VPU data pathway control bits determine routing:- Pathway 1100 (forward pass):
systolic → bias → leaky_relu → output - Pathway 1111 (transition):
systolic → bias → leaky_relu → loss → leaky_relu_derivative → output
vpu_data_pathway[3] is set to 1, the VPU routes:
- Systolic array outputs to bias module inputs
- Bias scalars from unified buffer to bias module
- Bias module outputs to the next stage (leaky ReLU)
https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/src/vpu.sv:213-224 for the bias routing logic.
Data flow
Implementation details
- Latency: 1 clock cycle (registered output)
- Throughput: 2 values per cycle (dual column processing)
- Bias update frequency: Bias values remain constant for an entire layer and are updated only between layers
- Reset behavior: On reset, output data and valid signals are cleared to zero
Source files
- Parent module:
https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/src/bias_parent.sv - Child module:
https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/src/bias_child.sv