Skip to main content

DenseLayer

A fully connected (dense) neural network layer that performs linear transformation: z = xW + b.

Constructor

DenseLayer(n_in, n_out, rng=None, dtype=np.float32)
n_in
int
required
Number of input features.
n_out
int
required
Number of output features (neurons in this layer).
rng
numpy.random.Generator
default:"None"
Random number generator for weight initialization. Uses np.random if not provided.
dtype
numpy.dtype
default:"np.float32"
Data type for weights and biases.

Attributes

weights
ndarray
Weight matrix of shape (n_in, n_out). Initialized using Xavier/Glorot uniform initialization.
bias
ndarray
Bias vector of shape (1, n_out). Initialized to zeros.
input_cache
ndarray
Cached input from the last forward pass, used during backpropagation.
z_cache
ndarray
Cached pre-activation output from the last forward pass.

Methods

forward

Performs forward propagation through the layer.
forward(x, weights=None, bias=None)
x
ndarray
required
Input data of shape (batch_size, n_in).
weights
ndarray
default:"None"
Optional weight matrix to use instead of self.weights. Useful for inference with different precision.
bias
ndarray
default:"None"
Optional bias vector to use instead of self.bias.
Returns: ndarray - Pre-activation output of shape (batch_size, n_out), computed as x @ weights + bias.

backward

Computes gradients during backpropagation.
backward(delta, l2_lambda=0.0)
delta
ndarray
required
Gradient of loss with respect to layer output, shape (batch_size, n_out).
l2_lambda
float
default:"0.0"
L2 regularization strength. Adds regularization term to weight gradients.
Returns: tuple[ndarray, ndarray, ndarray]
  • grad_input: Gradient with respect to input, shape (batch_size, n_in)
  • grad_w: Gradient with respect to weights, shape (n_in, n_out)
  • grad_b: Gradient with respect to bias, shape (1, n_out)

Usage Example

import numpy as np
from layers import DenseLayer

# Create a dense layer: 784 inputs -> 128 outputs
rng = np.random.default_rng(42)
layer = DenseLayer(n_in=784, n_out=128, rng=rng)

# Forward pass
X = np.random.randn(32, 784).astype(np.float32)  # batch of 32 samples
z = layer.forward(X)
print(z.shape)  # (32, 128)

# Backward pass
delta = np.random.randn(32, 128).astype(np.float32)  # gradient from next layer
grad_input, grad_w, grad_b = layer.backward(delta, l2_lambda=0.01)

# Update weights
learning_rate = 0.01
layer.weights -= learning_rate * grad_w
layer.bias -= learning_rate * grad_b

custom_uniform

Xavier/Glorot uniform weight initialization function.
custom_uniform(n_in, n_out, rng=None, dtype=np.float32)
n_in
int
required
Number of input features.
n_out
int
required
Number of output features.
rng
numpy.random.Generator
default:"None"
Random number generator. Uses np.random if not provided.
dtype
numpy.dtype
default:"np.float32"
Data type for the returned array.
Returns: ndarray - Weight matrix of shape (n_in, n_out) sampled from uniform distribution U(-limit, limit) where limit = sqrt(6 / (n_in + n_out)).

Usage Example

from layers import custom_uniform
import numpy as np

rng = np.random.default_rng(42)
weights = custom_uniform(n_in=784, n_out=128, rng=rng)
print(weights.shape)  # (784, 128)
print(f"Range: [{weights.min():.3f}, {weights.max():.3f}]")

Mathematical Background

Xavier initialization maintains variance across layers by sampling from:
W ~ U(-√(6/(n_in + n_out)), √(6/(n_in + n_out)))
This helps prevent vanishing or exploding gradients during training.

Build docs developers (and LLMs) love