Loss functions

MLPP provides a complete suite of differentiable loss functions for regression and classification tasks. All functions are implemented as templated free functions in the mlpp::losses namespace.

Regression losses

Regression losses measure the discrepancy between continuous predictions and ground truth values.

Mean squared error (MSE)

The most common regression loss, MSE is the arithmetic mean of squared prediction errors:

L(y, ŷ) = (1/n) Σᵢ (ŷᵢ - yᵢ)²

#include <MLPP/Losses/loss_functions.hpp>

std::vector<double> y_true = {1.0, 2.0, 3.0, 4.0};
std::vector<double> y_pred = {1.1, 2.2, 2.9, 4.1};

double loss = mlpp::losses::mse(y_true, y_pred);

Properties:

Differentiable everywhere
Heavily penalizes large errors (quadratic penalty)
Sensitive to outliers
Optimal for Gaussian noise

MSE is the maximum likelihood estimator when errors follow a normal distribution.

Mean absolute error (MAE)

MAE uses absolute differences, providing a more robust alternative to MSE:

L(y, ŷ) = (1/n) Σᵢ |ŷᵢ - yᵢ|

double loss = mlpp::losses::mae(y_true, y_pred);

Properties:

More robust to outliers than MSE
Non-differentiable at zero (subgradient exists)
Linear penalty for errors
Optimal for Laplacian noise

When to use:

Data contains outliers or heavy-tailed distributions
You want to minimize median error rather than mean error
Interpretability is important (same units as target)

Huber loss

The Huber loss combines the best properties of MSE and MAE by being quadratic for small errors and linear for large errors:

         ⎧ (1/2)r²              if |r| ≤ δ
L(r) = ⎨
         ⎩ δ(|r| - δ/2)        if |r| > δ

where r = ŷᵢ - yᵢ

double delta = 1.0;  // Transition threshold
double loss = mlpp::losses::huber(y_true, y_pred, delta);

Parameters:

delta (δ): Threshold where loss transitions from quadratic to linear
- Small δ: More robust, approaches MAE
- Large δ: Less robust, approaches MSE
- Default: δ = 1.0

Properties:

Differentiable everywhere
Robust to outliers (like MAE)
Smooth gradients near zero (like MSE)
Parameterized robustness

The choice of delta significantly affects model behavior. Cross-validation is recommended for selecting δ.

Classification losses

Classification losses measure the quality of discrete predictions.

Binary cross entropy

Used for binary classification with probabilistic outputs in [0, 1]:

L(y, p) = -(1/n) Σᵢ [yᵢ log(pᵢ) + (1 - yᵢ) log(1 - pᵢ)]

std::vector<double> y_true = {0.0, 1.0, 1.0, 0.0};
std::vector<double> y_pred = {0.1, 0.9, 0.8, 0.2};

double loss = mlpp::losses::binary_cross_entropy(y_true, y_pred);

Properties:

Outputs should be probabilities (sigmoid or softmax)
Convex in log-odds space
Heavily penalizes confident misclassifications
Maximum likelihood for Bernoulli distributions

Predictions are automatically clamped to [ε, 1-ε] where ε = 1e-12 to prevent numerical instability from log(0).

Multiclass cross entropy

Generalization of binary cross entropy for K > 2 classes:

L(y, p) = -(1/n) Σᵢ Σₖ yᵢₖ log(pᵢₖ)

std::vector<std::vector<double>> y_true = {
    {1.0, 0.0, 0.0},  // Sample 1: class 0
    {0.0, 1.0, 0.0},  // Sample 2: class 1
    {0.0, 0.0, 1.0}   // Sample 3: class 2
};

std::vector<std::vector<double>> y_pred = {
    {0.8, 0.1, 0.1},
    {0.1, 0.7, 0.2},
    {0.2, 0.3, 0.5}
};

double loss = mlpp::losses::multiclass_cross_entropy(y_true, y_pred);

Format:

y_true: One-hot encoded labels, shape (n_samples, n_classes)
y_pred: Predicted probabilities (typically from softmax), shape (n_samples, n_classes)
Each prediction vector should sum to 1.0

Hinge loss

The standard SVM loss for binary classification with labels in :

L(y, f) = (1/n) Σᵢ max(0, 1 - yᵢ·fᵢ)

std::vector<double> y_true = {-1.0, 1.0, 1.0, -1.0};
std::vector<double> y_pred = {-0.8, 1.2, 0.5, -1.5};

double loss = mlpp::losses::hinge_loss(y_true, y_pred);

Properties:

Designed for large-margin classification
Zero loss for correctly classified points beyond the margin
Linear penalty for violations
Non-differentiable at margin boundary

Interpretation:

Loss = 0: Correct classification with margin ≥ 1
Loss > 0: Either misclassified or margin < 1

Squared hinge loss

A differentiable variant of hinge loss with quadratic penalty:

L(y, f) = (1/n) Σᵢ max(0, 1 - yᵢ·fᵢ)²

double loss = mlpp::losses::squared_hinge_loss(y_true, y_pred);

Advantages over hinge:

Differentiable everywhere (enables gradient-based optimization)
Stronger penalty for large margin violations
Smoother loss surface

Trade-offs:

More sensitive to outliers than standard hinge
May require smaller learning rates

Regularization terms

MLPP provides regularization penalties to prevent overfitting.

L1 penalty (Lasso)

Encourages sparse solutions:

R(w) = Σⱼ |wⱼ|

std::vector<double> weights = {0.5, -0.3, 0.0, 0.8};
double penalty = mlpp::losses::l1_penalty(weights);

Effects:

Drives small weights to exactly zero
Performs automatic feature selection
Non-differentiable at zero (use proximal methods)

L2 penalty (Ridge)

Encourages small but non-zero weights:

R(w) = Σⱼ wⱼ²

double penalty = mlpp::losses::l2_penalty(weights);

Effects:

Shrinks all weights toward zero
Differentiable everywhere
Improves numerical stability

Elastic net penalty

Combines L1 and L2 regularization:

R(w) = α[ρ·‖w‖₁ + (1-ρ)·‖w‖₂²]

double alpha = 0.1;      // Overall regularization strength
double l1_ratio = 0.5;   // Balance between L1 and L2

double penalty = mlpp::losses::elastic_net_penalty(
    weights, alpha, l1_ratio
);

Parameters:

alpha (α): Overall regularization strength
l1_ratio (ρ): Mixing parameter ∈ [0, 1]
- ρ = 0: Pure L2 (ridge)
- ρ = 1: Pure L1 (lasso)
- 0 < ρ < 1: Combination

When to use:

Correlated features (L2 helps where L1 struggles)
Need feature selection but want to keep correlated groups
More stable than pure L1 when features >> samples

Template support

All loss functions support arbitrary arithmetic types:

// Single precision
std::vector<float> y_true_f = {1.0f, 2.0f, 3.0f};
std::vector<float> y_pred_f = {1.1f, 2.2f, 2.9f};
float loss_f = mlpp::losses::mse(y_true_f, y_pred_f);

// Double precision
std::vector<double> y_true_d = {1.0, 2.0, 3.0};
std::vector<double> y_pred_d = {1.1, 2.2, 2.9};
double loss_d = mlpp::losses::mse(y_true_d, y_pred_d);

// Integer (for counting)
std::vector<int> y_true_i = {1, 2, 3};
std::vector<int> y_pred_i = {1, 2, 3};
int loss_i = mlpp::losses::mae(y_true_i, y_pred_i);

Error handling

All loss functions validate input sizes:

try {
    std::vector<double> y_true = {1.0, 2.0, 3.0};
    std::vector<double> y_pred = {1.0, 2.0};  // Size mismatch!
    
    double loss = mlpp::losses::mse(y_true, y_pred);
} catch (const std::invalid_argument& e) {
    std::cerr << "Error: " << e.what() << std::endl;
    // Output: "mse: y_true and y_pred size mismatch."
}

Getting Started

Core Concepts

Supervised Learning

Unsupervised Learning

Model Validation

Advanced Topics

Loss functions

Regression losses

Mean squared error (MSE)

Mean absolute error (MAE)

Huber loss

Classification losses

Binary cross entropy

Multiclass cross entropy

Hinge loss

Squared hinge loss

Regularization terms

L1 penalty (Lasso)

L2 penalty (Ridge)

Elastic net penalty

Template support

Error handling

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Supervised Learning

Unsupervised Learning

Model Validation

Advanced Topics

​Regression losses

​Mean squared error (MSE)

​Mean absolute error (MAE)

​Huber loss

​Classification losses

​Binary cross entropy

​Multiclass cross entropy

​Hinge loss

​Squared hinge loss

​Regularization terms

​L1 penalty (Lasso)

​L2 penalty (Ridge)

​Elastic net penalty

​Template support

​Error handling

Build docs developers (and LLMs) love

Regression losses

Mean squared error (MSE)

Mean absolute error (MAE)

Huber loss

Classification losses

Binary cross entropy

Multiclass cross entropy

Hinge loss

Squared hinge loss

Regularization terms

L1 penalty (Lasso)

L2 penalty (Ridge)

Elastic net penalty

Template support

Error handling