Skip to main content

MeanSquaredError

Mean Squared Error (MSE) loss function with optional L2 regularization. Commonly used for regression and can be adapted for classification.

Methods

forward

Computes the mean squared error loss with optional L2 regularization.
MeanSquaredError.forward(y_pred, y_true, weights=None, l2_lambda=0.0)
y_pred
ndarray
required
Predicted outputs of shape (batch_size, n_outputs).
y_true
ndarray
required
Ground truth labels of shape (batch_size, n_outputs). Should be one-hot encoded for classification.
weights
list[ndarray]
default:"None"
List of weight matrices from all layers. Required when l2_lambda > 0 for regularization.
l2_lambda
float
default:"0.0"
L2 regularization strength. When > 0, adds penalty term 0.5 * λ * Σ(w²) to the loss.
Returns: float - Scalar loss value. Formula:
Loss = 0.5 * mean(Σ((y_pred - y_true)²)) + 0.5 * λ * Σ(w²)
       └─────────────┬────────────────┘   └──────┬──────┘
              data loss                  regularization

backward

Computes the gradient of the loss with respect to predictions.
MeanSquaredError.backward(y_pred, y_true)
y_pred
ndarray
required
Predicted outputs of shape (batch_size, n_outputs).
y_true
ndarray
required
Ground truth labels of shape (batch_size, n_outputs).
Returns: ndarray - Gradient of loss with respect to predictions, shape (batch_size, n_outputs). Formula: ∂L/∂y_pred = y_pred - y_true

Usage Examples

Basic Loss Calculation

import numpy as np
from loss import MeanSquaredError

# Predictions and ground truth (one-hot encoded)
y_pred = np.array([
    [0.7, 0.2, 0.1],
    [0.1, 0.8, 0.1]
])

y_true = np.array([
    [1.0, 0.0, 0.0],  # True class: 0
    [0.0, 1.0, 0.0]   # True class: 1
])

# Compute loss
loss = MeanSquaredError.forward(y_pred, y_true)
print(f"Loss: {loss:.4f}")  # Loss: 0.0600

# Compute gradient
grad = MeanSquaredError.backward(y_pred, y_true)
print(grad)
# [[-0.3  0.2  0.1]
#  [ 0.1 -0.2  0.1]]

Loss with L2 Regularization

import numpy as np
from loss import MeanSquaredError

# Model predictions
y_pred = np.array([[0.7, 0.2, 0.1]])
y_true = np.array([[1.0, 0.0, 0.0]])

# Model weights from two layers
weights = [
    np.random.randn(10, 5),  # Layer 1 weights
    np.random.randn(5, 3)    # Layer 2 weights
]

# Loss without regularization
loss_no_reg = MeanSquaredError.forward(y_pred, y_true, weights=None, l2_lambda=0.0)
print(f"Loss (no reg): {loss_no_reg:.4f}")

# Loss with L2 regularization
loss_with_reg = MeanSquaredError.forward(y_pred, y_true, weights=weights, l2_lambda=0.01)
print(f"Loss (with reg): {loss_with_reg:.4f}")

Integration with Neural Network

from model import NeuralNetworkModel
import numpy as np

# Create model with L2 regularization
model = NeuralNetworkModel(
    layer_sizes=[784, 128, 10],
    activations=['relu', 'softmax'],
    l2_lambda=0.01  # L2 regularization coefficient
)

# Training data
X_train = np.random.randn(100, 784).astype(np.float32)
y_train = np.random.randint(0, 10, size=100)

# The model internally uses MeanSquaredError for loss calculation
history = model.fit(X_train, y_train, epochs=10, alpha=0.01)

# Access training loss (includes regularization)
print(f"Final training loss: {history['loss'][-1]:.4f}")

Custom Training Loop

import numpy as np
from model import NeuralNetworkModel
from loss import MeanSquaredError

model = NeuralNetworkModel(
    layer_sizes=[784, 64, 10],
    activations=['relu', 'softmax'],
    l2_lambda=0.001
)

# One training step
X_batch = np.random.randn(32, 784).astype(np.float32)
y_batch_onehot = np.eye(10)[np.random.randint(0, 10, 32)]  # One-hot encoded

# Forward pass
y_pred = model.forward(X_batch, training=True)

# Compute loss
loss = MeanSquaredError.forward(
    y_pred, 
    y_batch_onehot, 
    weights=model.weights,
    l2_lambda=0.001
)

print(f"Batch loss: {loss:.4f}")

# Compute gradients
grad = MeanSquaredError.backward(y_pred, y_batch_onehot)
print(f"Gradient shape: {grad.shape}")  # (32, 10)

Mathematical Details

MSE Loss

For a batch of size N with C output dimensions:
Loss = (1/2N) * Σᵢ Σⱼ (yᵢⱼ_pred - yᵢⱼ_true)²
The factor of 1/2 simplifies the gradient to y_pred - y_true.

L2 Regularization

Penalizes large weights to prevent overfitting:
Reg_loss = (λ/2) * Σₗ Σᵢⱼ wₗᵢⱼ²
Where λ is l2_lambda and w are the weights from all layers.

Gradient

The gradient with respect to predictions is:
∂L/∂y_pred = (y_pred - y_true) / N
Note: The implementation returns y_pred - y_true without the /N factor, as the averaging is handled during weight updates in the model’s backward pass.

Build docs developers (and LLMs) love