Skip to main content
One of the reasons deep learning researchers have been able to scale up neural networks over the last decade is because neural networks can be vectorized. They can be implemented very efficiently using matrix multiplications.
Parallel computing hardware, including GPUs and some CPU functions, are exceptionally good at performing large matrix multiplications. Without these vectorization techniques, deep learning would not be nearly as successful today.

Why Vectorization Matters

Vectorization enables:
  • Speed: Matrix operations are orders of magnitude faster than loops
  • Scalability: Train massive neural networks with millions of parameters
  • Efficiency: Leverage GPU and CPU parallel processing capabilities

Non-Vectorized vs Vectorized Implementation

Traditional Loop-Based Implementation

Here’s how you might implement forward propagation using loops:
import numpy as np

# Input features
X = [200, 17]

# Parameters for 3 neurons
W = [w_1, w_2, w_3]  # List of weight vectors
B = [b_1, b_2, b_3]  # List of biases

# Loop through each neuron
a = []
for i in range(3):
    z = np.dot(W[i], X) + B[i]
    a.append(sigmoid(z))

# Output: [1, 0, 1]

Vectorized Matrix Implementation

The same computation using matrix operations:
import numpy as np

# Input features as 2D array (notice double brackets)
X = np.array([[200, 17]])

# Parameters as matrices
W = np.array([[w_1], [w_2], [w_3]])  # 2x3 matrix
B = np.array([[b_1, b_2, b_3]])      # 1x3 matrix

# Vectorized computation
Z = np.matmul(X, W) + B
A = sigmoid(Z)

# Output: [[1, 0, 1]]
The vectorized implementation replaces the entire for loop with just two lines of code, and it runs much faster!

Understanding the Vectorized Implementation

Let’s break down how the vectorized version works:
def dense(A_in, W, B):
    """
    Vectorized implementation of a dense layer
    
    Args:
        A_in: Input activations (matrix)
        W: Weight matrix
        B: Bias matrix
    
    Returns:
        A_out: Output activations
    """
    # Matrix multiplication + bias
    Z = np.matmul(A_in, W) + B
    
    # Apply activation function element-wise
    A_out = sigmoid(Z)
    
    return A_out
In the vectorized implementation, all quantities (X, W, B, Z, and A) are 2D arrays (matrices). This allows NumPy to use highly optimized matrix operations.

Matrix Multiplication in NumPy and TensorFlow

Basic Matrix Multiplication

Consider multiplying matrix A transpose by matrix W:
# Matrix A
A = np.array([
    [1, 2],
    [3, 4],
    [5, 6]
])

# Transpose of A
AT = A.T  # or np.transpose(A)
# AT = [[1, 3, 5],
#       [2, 4, 6]]

# Matrix W
W = np.array([
    [7, 8, 9],
    [10, 11, 12]
])

# Matrix multiplication
Z = np.matmul(AT, W)

Two Ways to Multiply Matrices

# Explicit matmul function
Z = np.matmul(AT, W)
This is the clearest and most readable approach.
This guide uses np.matmul() for clarity, but you may see the @ operator in other code.

Complete Vectorized Forward Propagation

Let’s implement forward propagation for the coffee roasting example:

Setting Up the Data

# Input features: temperature=200°C, duration=17min
A_T = np.array([[200, 17]])  # 1x2 matrix

# Weight matrix (stack w_1, w_2, w_3 as columns)
W = np.array([
    [1, -3, 5],      # First row
    [-2, 4, -6]      # Second row
])  # 2x3 matrix

# Bias matrix
B = np.array([[-1, 1, 2]])  # 1x3 matrix

Computing Layer Output

1
Compute Z Values
2
# Z = A_T @ W + B
Z = np.matmul(A_T, W) + B
# Result: [[165, -531, 900]]
3

Understanding the computation

For each column in W:
  • First column: (200 × 1) + (17 × -2) + (-1) = 165
  • Second column: (200 × -3) + (17 × 4) + 1 = -531
  • Third column: (200 × 5) + (17 × -6) + 2 = 900
4
Apply Activation Function
5
# Apply sigmoid element-wise
A = sigmoid(Z)
# Result: [[1, 0, 1]]
6
  • sigmoid(165) ≈ 1 (very large positive number)
  • sigmoid(-531) ≈ 0 (very large negative number)
  • sigmoid(900) ≈ 1 (very large positive number)

General Implementation of Dense Layer

Here’s a general-purpose implementation that works for any layer size:
def dense(A_in, W, B):
    """
    Implements a dense neural network layer
    
    Parameters:
    -----------
    A_in : numpy array
        Input activations from previous layer
    W : numpy array (n_input x n_units)
        Weight matrix where each column contains weights for one unit
    B : numpy array (1 x n_units)
        Bias values for each unit
        
    Returns:
    --------
    A_out : numpy array
        Output activations
    """
    # Get number of units from W shape
    units = W.shape[1]
    
    # Initialize output array
    A = np.zeros((1, units))
    
    # For each unit in the layer
    for j in range(units):
        # Get weights for this unit (jth column)
        w = W[:, j]
        
        # Compute weighted sum + bias
        z = np.dot(A_in, w) + B[0, j]
        
        # Apply activation function
        A[0, j] = sigmoid(z)
    
    return A
While this implementation uses a loop for clarity, the matrix multiplication version (np.matmul) is much faster in practice.

Building a Complete Neural Network

Using the dense layer function, you can build a multi-layer network:
def forward_propagation(X, W1, B1, W2, B2, W3, B3, W4, B4):
    """
    Forward propagation through a 4-layer neural network
    
    Args:
        X: Input features
        W1, B1: Layer 1 parameters
        W2, B2: Layer 2 parameters
        W3, B3: Layer 3 parameters
        W4, B4: Layer 4 parameters
    
    Returns:
        f_x: Network output
    """
    # Layer 1
    A1 = dense(X, W1, B1)
    
    # Layer 2
    A2 = dense(A1, W2, B2)
    
    # Layer 3
    A3 = dense(A2, W3, B3)
    
    # Layer 4 (output)
    A4 = dense(A3, W4, B4)
    
    return A4
Notice how we use uppercase W for weight matrices (following linear algebra conventions where uppercase denotes matrices) and lowercase for vectors and scalars.

Code Implementation in TensorFlow

Here’s how the vectorized dense layer looks in TensorFlow:
import tensorflow as tf

def dense(A_in, W, B):
    """
    TensorFlow-style dense layer implementation
    """
    # Matrix multiplication
    Z = tf.matmul(A_in, W) + B
    
    # Activation function
    A_out = tf.nn.sigmoid(Z)
    
    return A_out
TensorFlow follows the convention of laying individual examples in rows rather than columns, which is why we use A_in instead of A_T in the actual implementation.

Performance Comparison

Loop Implementation

  • Easy to understand
  • Slow for large networks
  • Not GPU-optimized
  • Sequential processing

Vectorized Implementation

  • Requires matrix knowledge
  • Extremely fast
  • GPU-optimized
  • Parallel processing

Key Benefits of Vectorization

1
Speed
2
Matrix operations can be 10-100x faster than loops
3
Scalability
4
Handle networks with millions of parameters efficiently
5
Hardware Utilization
6
Leverage GPU parallel processing capabilities
7
Code Simplicity
8
Fewer lines of code, easier to maintain

Matrix Shapes in Neural Networks

Understanding matrix shapes is crucial:
# Layer with 2 inputs, 3 units
A_in: (1, 2)  # 1 example, 2 features
W:    (2, 3)  # 2 inputs, 3 outputs
B:    (1, 3)  # 3 biases
Z:    (1, 3)  # Result of matmul
A_out: (1, 3)  # After activation
Always verify matrix dimensions match for multiplication. For matmul(A, B), the number of columns in A must equal the number of rows in B.

Practical Tips

Use print(array.shape) frequently when debugging. Shape mismatches are a common source of errors.
NumPy is great for learning and small experiments. TensorFlow is better for production and GPU acceleration.
Both NumPy and TensorFlow support broadcasting, which automatically handles adding vectors to matrices.

Summary

Vectorization is fundamental to modern deep learning:
  • Replaces slow loops with fast matrix operations
  • Enables training of large neural networks
  • Leverages parallel computing hardware
  • Simplifies code implementation
While libraries like TensorFlow handle vectorization automatically, understanding these concepts helps you write better code, debug issues, and optimize performance.

Next Steps

Now that you understand efficient neural network implementations:
  1. Practice with TensorFlow Implementation
  2. Learn about Neural Network Training
  3. Experiment with the accompanying notebooks to see vectorization in action
You now have the foundation to understand how modern deep learning frameworks achieve their impressive performance!

Build docs developers (and LLMs) love