Learn how neural networks are implemented efficiently using matrix multiplication and vectorization
One of the reasons deep learning researchers have been able to scale up neural networks over the last decade is because neural networks can be vectorized. They can be implemented very efficiently using matrix multiplications.
Parallel computing hardware, including GPUs and some CPU functions, are exceptionally good at performing large matrix multiplications. Without these vectorization techniques, deep learning would not be nearly as successful today.
Here’s how you might implement forward propagation using loops:
import numpy as np# Input featuresX = [200, 17]# Parameters for 3 neuronsW = [w_1, w_2, w_3] # List of weight vectorsB = [b_1, b_2, b_3] # List of biases# Loop through each neurona = []for i in range(3): z = np.dot(W[i], X) + B[i] a.append(sigmoid(z))# Output: [1, 0, 1]
Let’s break down how the vectorized version works:
def dense(A_in, W, B): """ Vectorized implementation of a dense layer Args: A_in: Input activations (matrix) W: Weight matrix B: Bias matrix Returns: A_out: Output activations """ # Matrix multiplication + bias Z = np.matmul(A_in, W) + B # Apply activation function element-wise A_out = sigmoid(Z) return A_out
In the vectorized implementation, all quantities (X, W, B, Z, and A) are 2D arrays (matrices). This allows NumPy to use highly optimized matrix operations.
Here’s a general-purpose implementation that works for any layer size:
def dense(A_in, W, B): """ Implements a dense neural network layer Parameters: ----------- A_in : numpy array Input activations from previous layer W : numpy array (n_input x n_units) Weight matrix where each column contains weights for one unit B : numpy array (1 x n_units) Bias values for each unit Returns: -------- A_out : numpy array Output activations """ # Get number of units from W shape units = W.shape[1] # Initialize output array A = np.zeros((1, units)) # For each unit in the layer for j in range(units): # Get weights for this unit (jth column) w = W[:, j] # Compute weighted sum + bias z = np.dot(A_in, w) + B[0, j] # Apply activation function A[0, j] = sigmoid(z) return A
While this implementation uses a loop for clarity, the matrix multiplication version (np.matmul) is much faster in practice.
Notice how we use uppercase W for weight matrices (following linear algebra conventions where uppercase denotes matrices) and lowercase for vectors and scalars.
Here’s how the vectorized dense layer looks in TensorFlow:
import tensorflow as tfdef dense(A_in, W, B): """ TensorFlow-style dense layer implementation """ # Matrix multiplication Z = tf.matmul(A_in, W) + B # Activation function A_out = tf.nn.sigmoid(Z) return A_out
TensorFlow follows the convention of laying individual examples in rows rather than columns, which is why we use A_in instead of A_T in the actual implementation.
Vectorization is fundamental to modern deep learning:
Replaces slow loops with fast matrix operations
Enables training of large neural networks
Leverages parallel computing hardware
Simplifies code implementation
While libraries like TensorFlow handle vectorization automatically, understanding these concepts helps you write better code, debug issues, and optimize performance.