VPU data pathway configurations

The Vector Processing Unit (VPU) contains four pipelined processing modules that can be selectively activated using the 4-bit vpu_data_pathway field.

VPU pipeline modules

The VPU consists of four sequential modules:

Bias addition - Adds bias vectors to systolic array outputs
Leaky ReLU - Applies activation function with configurable leak factor
MSE loss - Computes mean squared error against target values
Leaky ReLU derivative - Computes gradient of activation function

Each module can be independently enabled or bypassed based on the current computation stage.

Pathway configurations

The 4-bit vpu_data_pathway field controls which modules are active:

Forward pass - Layer 1

vpu_data_pathway = 0b1100

Active modules: Bias addition → Leaky ReLU Data flow:

Systolic array output (Z1) enters VPU
Bias module adds B1 vector
Leaky ReLU applies activation
Result (H1) exits VPU

Usage: Computing hidden layer activations during forward propagation

Forward pass - Output layer with loss

vpu_data_pathway = 0b1111

Active modules: Bias addition → Leaky ReLU → MSE loss Data flow:

Systolic array output (Z2) enters VPU
Bias module adds B2 vector
Leaky ReLU applies activation (H2)
MSE loss computes error against target Y
Result (dL/dZ2) exits VPU

Usage: Computing final layer output and beginning backpropagation

This pathway is described in comments as the “transition pathway from forward pass to backward pass” because it both completes the forward computation and produces the first gradient.

Backward pass - Activation derivative

vpu_data_pathway = 0b0001

Active modules: Leaky ReLU derivative only Data flow:

Upstream gradient (dL/dZ_next) enters VPU
Leaky ReLU derivative module multiplies by activation gradient
Result (dL/dZ) exits VPU

Usage: Propagating gradients through activation functions during backpropagation

Gradient computation - Bypass mode

vpu_data_pathway = 0b0000

Active modules: None (full bypass) Data flow:

Systolic array output passes directly through VPU
No processing applied
Raw systolic output exits VPU

Usage: Weight gradient calculation where VPU processing is not needed

Pointer routing coordination

The VPU pathway configuration must be coordinated with ub_ptr_sel to route the correct data to each module:

Pathway	Module needing data	ub_ptr_sel	Data source
`0b1100`	Bias addition	`010`	Bias vector from UB
`0b1111`	Bias addition	`010`	Bias vector from UB
`0b1111`	MSE loss	`011`	Target values (Y) from UB
`0b0001`	Leaky ReLU derivative	`100`	Pre-activation values (H) from UB

Example: Forward pass configuration

From test_tpu.py:184-203, loading inputs and computing first layer:

# Configure for forward pass through layer 1
dut.vpu_data_pathway.value = 0b1100  # Bias + ReLU routing

# Read input matrix X into systolic array
dut.ub_rd_start_in.value = 1
dut.ub_ptr_select.value = 0  # Route to systolic left input
dut.ub_rd_addr_in.value = 0
dut.ub_rd_row_size.value = 4
dut.ub_rd_col_size.value = 2

# Read bias B1 into VPU bias module
dut.ub_rd_start_in.value = 1
dut.ub_ptr_select.value = 2  # Route to bias module
dut.ub_rd_addr_in.value = 16
dut.ub_rd_row_size.value = 4
dut.ub_rd_col_size.value = 2

Result: Systolic array computes X @ W1^T, then VPU adds B1 and applies Leaky ReLU to produce H1

Example: Backward pass configuration

From test_tpu.py:322-349, computing gradients for layer 1:

# Configure for backward pass activation derivative
dut.vpu_data_pathway.value = 0b0001  # Activation derivative only

# Read upstream gradient into systolic array
dut.ub_rd_start_in.value = 1
dut.ub_ptr_select.value = 0  # Route to systolic left input
dut.ub_rd_addr_in.value = 29
dut.ub_rd_row_size.value = 4
dut.ub_rd_col_size.value = 1

# Read pre-activation H1 into VPU derivative module
dut.ub_rd_start_in.value = 1
dut.ub_ptr_select.value = 4  # Route to activation derivative
dut.ub_rd_addr_in.value = 21
dut.ub_rd_row_size.value = 4
dut.ub_rd_col_size.value = 2

Result: Systolic output multiplied element-wise with activation derivatives to propagate gradient

Gradient descent data routing

During weight updates, the VPU uses additional pointer selections:

# Route old bias values to gradient descent module
dut.ub_ptr_select.value = 5  # Gradient descent (bias)

# Route old weight values to gradient descent module  
dut.ub_ptr_select.value = 6  # Gradient descent (weights)

These pointer selections work with vpu_data_pathway = 0b0000 (bypass mode) since gradient descent happens after the main VPU pipeline.

The VPU is fully pipelined - new data can enter every cycle even while previous data is still processing through later stages.

Get Started

Architecture

Instruction Set

Development

VPU data pathway configurations

VPU pipeline modules

Pathway configurations

Forward pass - Layer 1

Forward pass - Output layer with loss

Backward pass - Activation derivative

Gradient computation - Bypass mode

Pointer routing coordination

Example: Forward pass configuration

Example: Backward pass configuration

Gradient descent data routing

Build docs developers (and LLMs) love

Get Started

Architecture

Instruction Set

Development

​VPU pipeline modules

​Pathway configurations

​Forward pass - Layer 1

​Forward pass - Output layer with loss

​Backward pass - Activation derivative

​Gradient computation - Bypass mode

​Pointer routing coordination

​Example: Forward pass configuration

​Example: Backward pass configuration

​Gradient descent data routing

Build docs developers (and LLMs) love

VPU pipeline modules

Pathway configurations

Forward pass - Layer 1

Forward pass - Output layer with loss

Backward pass - Activation derivative

Gradient computation - Bypass mode

Pointer routing coordination

Example: Forward pass configuration

Example: Backward pass configuration

Gradient descent data routing