Skip to main content
MLPP provides a complete suite of differentiable loss functions for regression and classification tasks. All functions are implemented as templated free functions in the mlpp::losses namespace.

Regression losses

Regression losses measure the discrepancy between continuous predictions and ground truth values.

Mean squared error (MSE)

The most common regression loss, MSE is the arithmetic mean of squared prediction errors:
L(y, ŷ) = (1/n) Σᵢ (ŷᵢ - yᵢ)²
#include <MLPP/Losses/loss_functions.hpp>

std::vector<double> y_true = {1.0, 2.0, 3.0, 4.0};
std::vector<double> y_pred = {1.1, 2.2, 2.9, 4.1};

double loss = mlpp::losses::mse(y_true, y_pred);
Properties:
  • Differentiable everywhere
  • Heavily penalizes large errors (quadratic penalty)
  • Sensitive to outliers
  • Optimal for Gaussian noise
MSE is the maximum likelihood estimator when errors follow a normal distribution.

Mean absolute error (MAE)

MAE uses absolute differences, providing a more robust alternative to MSE:
L(y, ŷ) = (1/n) Σᵢ |ŷᵢ - yᵢ|
double loss = mlpp::losses::mae(y_true, y_pred);
Properties:
  • More robust to outliers than MSE
  • Non-differentiable at zero (subgradient exists)
  • Linear penalty for errors
  • Optimal for Laplacian noise
When to use:
  • Data contains outliers or heavy-tailed distributions
  • You want to minimize median error rather than mean error
  • Interpretability is important (same units as target)

Huber loss

The Huber loss combines the best properties of MSE and MAE by being quadratic for small errors and linear for large errors:
         ⎧ (1/2)r²              if |r| ≤ δ
L(r) =
         ⎩ δ(|r| - δ/2)        if |r| > δ

where r = ŷᵢ - yᵢ
double delta = 1.0;  // Transition threshold
double loss = mlpp::losses::huber(y_true, y_pred, delta);
Parameters:
  • delta (δ): Threshold where loss transitions from quadratic to linear
    • Small δ: More robust, approaches MAE
    • Large δ: Less robust, approaches MSE
    • Default: δ = 1.0
Properties:
  • Differentiable everywhere
  • Robust to outliers (like MAE)
  • Smooth gradients near zero (like MSE)
  • Parameterized robustness
The choice of delta significantly affects model behavior. Cross-validation is recommended for selecting δ.

Classification losses

Classification losses measure the quality of discrete predictions.

Binary cross entropy

Used for binary classification with probabilistic outputs in [0, 1]:
L(y, p) = -(1/n) Σᵢ [yᵢ log(pᵢ) + (1 - yᵢ) log(1 - pᵢ)]
std::vector<double> y_true = {0.0, 1.0, 1.0, 0.0};
std::vector<double> y_pred = {0.1, 0.9, 0.8, 0.2};

double loss = mlpp::losses::binary_cross_entropy(y_true, y_pred);
Properties:
  • Outputs should be probabilities (sigmoid or softmax)
  • Convex in log-odds space
  • Heavily penalizes confident misclassifications
  • Maximum likelihood for Bernoulli distributions
Predictions are automatically clamped to [ε, 1-ε] where ε = 1e-12 to prevent numerical instability from log(0).

Multiclass cross entropy

Generalization of binary cross entropy for K > 2 classes:
L(y, p) = -(1/n) Σᵢ Σₖ yᵢₖ log(pᵢₖ)
std::vector<std::vector<double>> y_true = {
    {1.0, 0.0, 0.0},  // Sample 1: class 0
    {0.0, 1.0, 0.0},  // Sample 2: class 1
    {0.0, 0.0, 1.0}   // Sample 3: class 2
};

std::vector<std::vector<double>> y_pred = {
    {0.8, 0.1, 0.1},
    {0.1, 0.7, 0.2},
    {0.2, 0.3, 0.5}
};

double loss = mlpp::losses::multiclass_cross_entropy(y_true, y_pred);
Format:
  • y_true: One-hot encoded labels, shape (n_samples, n_classes)
  • y_pred: Predicted probabilities (typically from softmax), shape (n_samples, n_classes)
  • Each prediction vector should sum to 1.0

Hinge loss

The standard SVM loss for binary classification with labels in :
L(y, f) = (1/n) Σᵢ max(0, 1 - yᵢ·fᵢ)
std::vector<double> y_true = {-1.0, 1.0, 1.0, -1.0};
std::vector<double> y_pred = {-0.8, 1.2, 0.5, -1.5};

double loss = mlpp::losses::hinge_loss(y_true, y_pred);
Properties:
  • Designed for large-margin classification
  • Zero loss for correctly classified points beyond the margin
  • Linear penalty for violations
  • Non-differentiable at margin boundary
Interpretation:
  • Loss = 0: Correct classification with margin ≥ 1
  • Loss > 0: Either misclassified or margin < 1

Squared hinge loss

A differentiable variant of hinge loss with quadratic penalty:
L(y, f) = (1/n) Σᵢ max(0, 1 - yᵢ·fᵢ)²
double loss = mlpp::losses::squared_hinge_loss(y_true, y_pred);
Advantages over hinge:
  • Differentiable everywhere (enables gradient-based optimization)
  • Stronger penalty for large margin violations
  • Smoother loss surface
Trade-offs:
  • More sensitive to outliers than standard hinge
  • May require smaller learning rates

Regularization terms

MLPP provides regularization penalties to prevent overfitting.

L1 penalty (Lasso)

Encourages sparse solutions:
R(w) = Σⱼ |wⱼ|
std::vector<double> weights = {0.5, -0.3, 0.0, 0.8};
double penalty = mlpp::losses::l1_penalty(weights);
Effects:
  • Drives small weights to exactly zero
  • Performs automatic feature selection
  • Non-differentiable at zero (use proximal methods)

L2 penalty (Ridge)

Encourages small but non-zero weights:
R(w) = Σⱼ wⱼ²
double penalty = mlpp::losses::l2_penalty(weights);
Effects:
  • Shrinks all weights toward zero
  • Differentiable everywhere
  • Improves numerical stability

Elastic net penalty

Combines L1 and L2 regularization:
R(w) = α[ρ·‖w‖₁ + (1-ρ)·‖w‖₂²]
double alpha = 0.1;      // Overall regularization strength
double l1_ratio = 0.5;   // Balance between L1 and L2

double penalty = mlpp::losses::elastic_net_penalty(
    weights, alpha, l1_ratio
);
Parameters:
  • alpha (α): Overall regularization strength
  • l1_ratio (ρ): Mixing parameter ∈ [0, 1]
    • ρ = 0: Pure L2 (ridge)
    • ρ = 1: Pure L1 (lasso)
    • 0 < ρ < 1: Combination
When to use:
  • Correlated features (L2 helps where L1 struggles)
  • Need feature selection but want to keep correlated groups
  • More stable than pure L1 when features >> samples

Template support

All loss functions support arbitrary arithmetic types:
// Single precision
std::vector<float> y_true_f = {1.0f, 2.0f, 3.0f};
std::vector<float> y_pred_f = {1.1f, 2.2f, 2.9f};
float loss_f = mlpp::losses::mse(y_true_f, y_pred_f);

// Double precision
std::vector<double> y_true_d = {1.0, 2.0, 3.0};
std::vector<double> y_pred_d = {1.1, 2.2, 2.9};
double loss_d = mlpp::losses::mse(y_true_d, y_pred_d);

// Integer (for counting)
std::vector<int> y_true_i = {1, 2, 3};
std::vector<int> y_pred_i = {1, 2, 3};
int loss_i = mlpp::losses::mae(y_true_i, y_pred_i);

Error handling

All loss functions validate input sizes:
try {
    std::vector<double> y_true = {1.0, 2.0, 3.0};
    std::vector<double> y_pred = {1.0, 2.0};  // Size mismatch!
    
    double loss = mlpp::losses::mse(y_true, y_pred);
} catch (const std::invalid_argument& e) {
    std::cerr << "Error: " << e.what() << std::endl;
    // Output: "mse: y_true and y_pred size mismatch."
}

Build docs developers (and LLMs) love