Architecture
The leaky ReLU module uses a parent-child hierarchy:- leaky_relu_parent: Top-level module instantiating two leaky_relu_child modules
- leaky_relu_child: Individual processing unit applying activation to one feature column
Module ports
leaky_relu_parent
System clock signal
Active-high reset signal
Leak factor (α) for negative inputs, shared across both columns
Valid signal for column 1 input data
Valid signal for column 2 input data
Pre-activation input for column 1
Pre-activation input for column 2
Activated output for column 1
Activated output for column 2
Valid signal for column 1 output
Valid signal for column 2 output
leaky_relu_child
System clock signal
Active-high reset signal
Valid signal for input data
Pre-activation input value
Leak factor (α) for negative inputs
Activated output value
Output valid signal
Activation function
The leaky ReLU function is defined as:Operation
Pipeline stages
- Sign detection: Check if input is non-negative (combinational)
- Conditional computation:
- If
lr_data_in >= 0: Pass through unchanged - If
lr_data_in < 0: Multiply by leak factor usingfxp_mul
- If
- Registered output: On clock edge, output the result with valid signal
Fixed-point arithmetic
The module uses 16-bit signed fixed-point representation (Q8.8 format). For negative inputs, thefxp_mul module performs:
- Fixed-point multiplication:
result = input × leak_factor - Proper handling of binary point positioning
- Overflow detection and saturation
https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/src/fixedpoint.sv:278 for the fxp_mul implementation.
Leak factor
The leak factor is:- Stored in the unified buffer as a 16-bit fixed-point value
- Shared across both columns (same α for all activations in a layer)
- Typically set to small values like 0.01 (represented as 0x0028 in Q8.8)
- Remains constant throughout the forward pass for a layer
Integration with VPU
The leaky ReLU module is active during forward pass and transition pathways:- Pathway 1100 (forward pass):
systolic → bias → leaky_relu → output - Pathway 1111 (transition):
systolic → bias → leaky_relu → loss → leaky_relu_derivative → output
vpu_data_pathway[2] is set to 1:
- Bias module outputs route to leaky ReLU inputs
- Leak factor is provided from unified buffer
- Leaky ReLU outputs proceed to next stage or final output
- Activated values (H matrix) are cached when transitioning to backward pass
https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/src/vpu.sv:240-265 for the leaky ReLU routing logic.
Data flow
Implementation details
- Latency: 1 clock cycle (registered output)
- Throughput: 2 activations per cycle
- Comparison operation: Uses sign bit check (
lr_data_in >= 0) - Multiplication: Combinational fixed-point multiply for negative path
- Reset behavior: Outputs cleared to zero, valid signals deasserted
- Zero handling: Zero is treated as non-negative (passes through)
Advantages of leaky ReLU
- Prevents dying neurons: Unlike standard ReLU, neurons with negative inputs still propagate gradients
- Simple hardware: Only requires one comparator and one multiplier per processing unit
- No saturation: Function is unbounded on both positive and negative sides
- Efficient gradient: Derivative is constant (either 1 or α), enabling fast backward pass computation
Source files
- Parent module:
https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/src/leaky_relu_parent.sv - Child module:
https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/src/leaky_relu_child.sv