Module declaration
Input ports
Systolic array inputs
| Port | Width | Description |
|---|---|---|
vpu_data_in_1 | signed [15:0] | Data input from systolic array column 1 |
vpu_data_in_2 | signed [15:0] | Data input from systolic array column 2 |
vpu_valid_in_1 | 1 | Valid signal for column 1 |
vpu_valid_in_2 | 1 | Valid signal for column 2 |
Unified buffer inputs
| Port | Width | Description |
|---|---|---|
bias_scalar_in_1 | signed [15:0] | Bias value for column 1 |
bias_scalar_in_2 | signed [15:0] | Bias value for column 2 |
lr_leak_factor_in | signed [15:0] | Leak factor α for leaky ReLU (Q8.8 format) |
Y_in_1 | signed [15:0] | Ground truth label for loss computation (column 1) |
Y_in_2 | signed [15:0] | Ground truth label for loss computation (column 2) |
inv_batch_size_times_two_in | signed [15:0] | Scaling factor: 1/(batch_size × 2) |
H_in_1 | signed [15:0] | Cached activation value for derivative (column 1) |
H_in_2 | signed [15:0] | Cached activation value for derivative (column 2) |
Control signal
| Port | Width | Description | |||
|---|---|---|---|---|---|
vpu_data_pathway | [3:0] | Module enable bits: `[bias | leaky_relu | loss | leaky_relu_derivative]` |
Output ports
| Port | Width | Description |
|---|---|---|
vpu_data_out_1 | signed [15:0] | Processed data output for column 1 |
vpu_data_out_2 | signed [15:0] | Processed data output for column 2 |
vpu_valid_out_1 | 1 | Valid signal for column 1 output |
vpu_valid_out_2 | 1 | Valid signal for column 2 output |
Architecture
Module pipeline
The VPU consists of four processing stages:- Bias (
bias_parent): Adds bias to input values - Leaky ReLU (
leaky_relu_parent): Applies leaky ReLU activation - Loss (
loss_parent): Computes loss derivative (∂L/∂H) - Leaky ReLU Derivative (
leaky_relu_derivative_parent): Computes activation derivative
Data pathways
Thevpu_data_pathway signal configures the active modules:
Bias module enable (1 = enabled)
Leaky ReLU module enable (1 = enabled)
Loss module enable (1 = enabled)
Leaky ReLU derivative module enable (1 = enabled)
Operation modes
Forward pass pathway (4'b1100)
H = LeakyReLU(Z) where Z = X + b
Use case: Hidden layer activations during forward propagation
Transition pathway (4'b1111)
∂L/∂Z = (H - Y) / (batch_size × 2) ⊙ LeakyReLU'(H)
Use case: Final layer computation that transitions from forward to backward pass. The H matrix is cached internally for use in the derivative calculation.
Backward pass pathway (4'b0001)
∂L/∂Z = ∂L/∂H ⊙ LeakyReLU'(H)
Use case: Hidden layer gradients during backpropagation. H values come from H_in_* ports.
Inactive mode (4'b0000)
All modules bypassed, no processing occurs.
Activation caching
The VPU includes an internal cache for H matrices (activation outputs): From ~https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/src/vpu.sv:333-348:Signal routing logic
The VPU uses combinational logic to route signals through the enabled modules: From ~https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/src/vpu.sv:187-330:Timing behavior
- Bias: 1 clock cycle latency
- Leaky ReLU: 1 clock cycle latency
- Loss: 1 clock cycle latency
- Leaky ReLU Derivative: 1 clock cycle latency
- Forward pass: 2 cycles (bias + leaky ReLU)
- Transition: 4 cycles (all modules)
- Backward pass: 1 cycle (leaky ReLU derivative only)
Example instantiation
From ~https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/src/tpu.sv:157-184:Related modules
- TPU - Top-level integration
- Unified Buffer - Data source and destination
- Bias parent (~https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/src/bias_parent.sv)
- Leaky ReLU parent (~https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/src/leaky_relu_parent.sv)
- Loss parent (~https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/src/loss_parent.sv)
- Leaky ReLU derivative parent (~https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/src/leaky_relu_derivative_parent.sv)
Testing
See test files:- ~https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/test/dump_vpu.sv - Waveform dump configuration
- ~https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/test/test_vpu.py - Python test suite