Skip to main content
The control unit is a purely combinational module that decodes a wide instruction word into individual control signals for the TPU’s various components. It acts as an instruction decoder, mapping bit fields to named control signals.

Module declaration

module control_unit (
    input logic [87:0] instruction,
    // Output signals (see below)
    output logic sys_switch_in,
    output logic ub_rd_start_in,
    output logic ub_rd_transpose,
    output logic ub_wr_host_valid_in_1,
    output logic ub_wr_host_valid_in_2,
    output logic [1:0] ub_rd_col_size,
    output logic [7:0] ub_rd_row_size,
    output logic [1:0] ub_rd_addr_in,
    output logic [2:0] ub_ptr_sel,
    output logic [15:0] ub_wr_host_data_in_1,
    output logic [15:0] ub_wr_host_data_in_2,
    output logic [3:0] vpu_data_pathway,
    output logic [15:0] inv_batch_size_times_two_in,
    output logic [15:0] vpu_leak_factor_in
);

Input port

instruction
logic [87:0]
88-bit instruction word containing all control fields

Output signals

1-bit control signals (bits 0-4)

OutputBit PositionDescription
sys_switch_in0Switch systolic array weights from shadow to active
ub_rd_start_in1Start unified buffer read operation
ub_rd_transpose2Read matrix in transposed order
ub_wr_host_valid_in_13Valid signal for host write channel 1
ub_wr_host_valid_in_24Valid signal for host write channel 2

2-bit signals

OutputBit RangeDescription
ub_rd_col_size6:5Number of columns to read (1-2)
ub_rd_addr_in16:15Starting address for unified buffer read

3-bit signal

OutputBit RangeDescription
ub_ptr_sel19:17Unified buffer pointer selector (0-6)

4-bit signal

OutputBit RangeDescription
vpu_data_pathway55:52VPU module enable: [bias|leaky_relu|loss|leaky_relu_deriv]

8-bit signal

OutputBit RangeDescription
ub_rd_row_size14:7Number of rows to read from unified buffer

16-bit signals

OutputBit RangeDescription
ub_wr_host_data_in_135:20Host data for write channel 1
ub_wr_host_data_in_251:36Host data for write channel 2
inv_batch_size_times_two_in71:56Scaling factor for loss computation
vpu_leak_factor_in87:72Leak factor α for leaky ReLU activation

Instruction format

The 88-bit instruction word is organized as follows:
Bits    | Width | Field Name
--------|-------|---------------------------
0       | 1     | sys_switch_in
1       | 1     | ub_rd_start_in
2       | 1     | ub_rd_transpose
3       | 1     | ub_wr_host_valid_in_1
4       | 1     | ub_wr_host_valid_in_2
6:5     | 2     | ub_rd_col_size
14:7    | 8     | ub_rd_row_size
16:15   | 2     | ub_rd_addr_in
19:17   | 3     | ub_ptr_sel
35:20   | 16    | ub_wr_host_data_in_1
51:36   | 16    | ub_wr_host_data_in_2
55:52   | 4     | vpu_data_pathway
71:56   | 16    | inv_batch_size_times_two_in
87:72   | 16    | vpu_leak_factor_in

Implementation

The control unit uses continuous assignments to map instruction bits to outputs: From ~https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/src/control_unit.sv:36-69:
// 1-bit signals
assign sys_switch_in = instruction[0];
assign ub_rd_start_in = instruction[1];
assign ub_rd_transpose = instruction[2];
assign ub_wr_host_valid_in_1 = instruction[3];
assign ub_wr_host_valid_in_2 = instruction[4];

// 2-bit signals
assign ub_rd_col_size = instruction[6:5];
assign ub_rd_addr_in = instruction[16:15];

// 3-bit signal
assign ub_ptr_sel = instruction[19:17];

// 8-bit signal
assign ub_rd_row_size = instruction[14:7];

// 16-bit signals
assign ub_wr_host_data_in_1 = instruction[35:20];
assign ub_wr_host_data_in_2 = instruction[51:36];
assign vpu_data_pathway = instruction[55:52];
assign inv_batch_size_times_two_in = instruction[71:56];
assign vpu_leak_factor_in = instruction[87:72];

Combinational logic

The control unit contains no sequential logic - all outputs are combinational functions of the instruction input. This means:
  • Zero clock cycle latency
  • No state is stored
  • Outputs change immediately when instruction changes

Example instruction encoding

Forward pass setup

logic [87:0] instruction;

// Start read, pointer=0 (input), 2x2 matrix, no transpose
instruction[1] = 1'b1;      // ub_rd_start_in
instruction[2] = 1'b0;      // ub_rd_transpose
instruction[6:5] = 2'd2;    // ub_rd_col_size = 2
instruction[14:7] = 8'd2;   // ub_rd_row_size = 2
instruction[19:17] = 3'd0;  // ub_ptr_sel = 0 (input)
instruction[55:52] = 4'b1100; // vpu_data_pathway = bias + leaky_relu

Weight loading

// Load weights into unified buffer
instruction[3] = 1'b1;           // ub_wr_host_valid_in_1
instruction[4] = 1'b1;           // ub_wr_host_valid_in_2
instruction[35:20] = 16'h0100;   // ub_wr_host_data_in_1 = 1.0 (Q8.8)
instruction[51:36] = 16'h0080;   // ub_wr_host_data_in_2 = 0.5 (Q8.8)

Weight switching

// Switch weights from shadow to active in systolic array
instruction[0] = 1'b1;      // sys_switch_in

Design rationale

The control unit provides several benefits:
  1. Abstraction: Hides bit-level instruction encoding from higher-level modules
  2. Flexibility: Instruction format can be modified by changing only this module
  3. Clarity: Named signals are more readable than bit indices
  4. Reusability: Instruction format is documented in one place

Integration with TPU

In a complete system, the control unit would receive instructions from:
  • Instruction memory (for programmed sequences)
  • Host controller (for interactive control)
  • Microsequencer (for repeated patterns)
Currently, the Tiny TPU design does not include the control unit in the top-level TPU module, but it demonstrates the intended instruction format for future integration.
  • TPU - Receives decoded control signals
  • Unified Buffer - Controlled by read/write signals
  • Systolic Array - Controlled by switch signal
  • VPU - Controlled by pathway selection

Testing

The control unit can be tested by:
  1. Encoding known instruction patterns
  2. Verifying correct signal decoding
  3. Checking all bit positions are correctly mapped
  4. Ensuring no unassigned bits
Example test:
control_unit cu_inst(.instruction(88'hABCDEF0123456789ABCDEF));
assert(cu_inst.sys_switch_in == instruction[0]);
assert(cu_inst.ub_rd_start_in == instruction[1]);
// ... verify all outputs

Build docs developers (and LLMs) love