Skip to main content
The leaky ReLU module applies the leaky rectified linear unit activation function to pre-activation values during the forward pass. This non-linear activation allows the network to learn complex patterns while mitigating the dying ReLU problem.

Architecture

The leaky ReLU module uses a parent-child hierarchy:
  • leaky_relu_parent: Top-level module instantiating two leaky_relu_child modules
  • leaky_relu_child: Individual processing unit applying activation to one feature column
The dual-column design enables parallel processing of two activation values per cycle, matching the VPU’s 2x2 systolic array configuration.

Module ports

leaky_relu_parent

clk
input logic
System clock signal
rst
input logic
Active-high reset signal
lr_leak_factor_in
input logic signed [15:0]
Leak factor (α) for negative inputs, shared across both columns
lr_valid_1_in
input logic
Valid signal for column 1 input data
lr_valid_2_in
input logic
Valid signal for column 2 input data
lr_data_1_in
input logic signed [15:0]
Pre-activation input for column 1
lr_data_2_in
input logic signed [15:0]
Pre-activation input for column 2
lr_data_1_out
output logic signed [15:0]
Activated output for column 1
lr_data_2_out
output logic signed [15:0]
Activated output for column 2
lr_valid_1_out
output logic
Valid signal for column 1 output
lr_valid_2_out
output logic
Valid signal for column 2 output

leaky_relu_child

clk
input logic
System clock signal
rst
input logic
Active-high reset signal
lr_valid_in
input logic
Valid signal for input data
lr_data_in
input logic signed [15:0]
Pre-activation input value
lr_leak_factor_in
input logic signed [15:0]
Leak factor (α) for negative inputs
lr_data_out
output logic signed [15:0]
Activated output value
lr_valid_out
output logic
Output valid signal

Activation function

The leaky ReLU function is defined as:
f(x) = { x      if x ≥ 0
       { α·x    if x < 0
Where α (leak factor) is a small positive constant (typically 0.01 to 0.3) that allows a small gradient to flow through negative values, preventing neurons from “dying” during training.

Operation

Pipeline stages

  1. Sign detection: Check if input is non-negative (combinational)
  2. Conditional computation:
    • If lr_data_in >= 0: Pass through unchanged
    • If lr_data_in < 0: Multiply by leak factor using fxp_mul
  3. Registered output: On clock edge, output the result with valid signal

Fixed-point arithmetic

The module uses 16-bit signed fixed-point representation (Q8.8 format). For negative inputs, the fxp_mul module performs:
  • Fixed-point multiplication: result = input × leak_factor
  • Proper handling of binary point positioning
  • Overflow detection and saturation
See https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/src/fixedpoint.sv:278 for the fxp_mul implementation.

Leak factor

The leak factor is:
  • Stored in the unified buffer as a 16-bit fixed-point value
  • Shared across both columns (same α for all activations in a layer)
  • Typically set to small values like 0.01 (represented as 0x0028 in Q8.8)
  • Remains constant throughout the forward pass for a layer

Integration with VPU

The leaky ReLU module is active during forward pass and transition pathways:
  • Pathway 1100 (forward pass): systolic → bias → leaky_relu → output
  • Pathway 1111 (transition): systolic → bias → leaky_relu → loss → leaky_relu_derivative → output
When vpu_data_pathway[2] is set to 1:
  • Bias module outputs route to leaky ReLU inputs
  • Leak factor is provided from unified buffer
  • Leaky ReLU outputs proceed to next stage or final output
  • Activated values (H matrix) are cached when transitioning to backward pass
See https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/src/vpu.sv:240-265 for the leaky ReLU routing logic.

Data flow

Bias Module (Z = X·W + b)
         |
         v
[leaky_relu_child] <-- Leak Factor (α) from UB
         |
         v
   Activation H
         |
         +---> Output (forward pass)
         |
         +---> Loss Module (transition)
         |
         +---> Cached for backward pass

Implementation details

  • Latency: 1 clock cycle (registered output)
  • Throughput: 2 activations per cycle
  • Comparison operation: Uses sign bit check (lr_data_in >= 0)
  • Multiplication: Combinational fixed-point multiply for negative path
  • Reset behavior: Outputs cleared to zero, valid signals deasserted
  • Zero handling: Zero is treated as non-negative (passes through)

Advantages of leaky ReLU

  1. Prevents dying neurons: Unlike standard ReLU, neurons with negative inputs still propagate gradients
  2. Simple hardware: Only requires one comparator and one multiplier per processing unit
  3. No saturation: Function is unbounded on both positive and negative sides
  4. Efficient gradient: Derivative is constant (either 1 or α), enabling fast backward pass computation

Source files

  • Parent module: https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/src/leaky_relu_parent.sv
  • Child module: https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/src/leaky_relu_child.sv

Build docs developers (and LLMs) love