Cross-validation

MLPP provides two essential cross-validation tools: stratified k-fold splitting for reliable performance estimates and ROC curve analysis for threshold-agnostic binary classification evaluation.

Stratified k-fold

The StratifiedKFold class partitions data into k folds while maintaining class proportions in each fold. This is critical for imbalanced datasets where naive splitting can produce folds with zero representation of minority classes.

Basic usage

#include "Model Validation/stratified_kfold.hpp"

using namespace mlpp::model_validation;

// Labels for your dataset
std::vector<int> labels = {0, 0, 0, 1, 1, 1, 2, 2, 2};

// Create 3-fold splitter with shuffling
StratifiedKFold<int> skf(3, true, 42);  // n_splits=3, shuffle=true, seed=42

// Generate all train/val splits
auto splits = skf.split(labels);

for (size_t fold = 0; fold < splits.size(); ++fold) {
    auto [train_idx, val_idx] = splits[fold];
    
    // Use train_idx and val_idx to partition your data
    std::cout << "Fold " << fold << ":\n";
    std::cout << "  Train: " << train_idx.size() << " samples\n";
    std::cout << "  Val:   " << val_idx.size() << " samples\n";
}

Constructor

explicit StratifiedKFold(std::size_t n_splits = 5,
                         bool        shuffle   = false,
                         std::size_t seed      = 0);

Parameters:

n_splits: Number of folds k ≥ 2 (default: 5)
shuffle: Whether to shuffle within each class before assigning to folds (default: false)
seed: RNG seed when shuffle is true (default: 0)

Always use shuffle=true with a fixed seed for reproducible but randomized splits. This prevents biases from ordering in your dataset.

Splitting strategy

The stratification algorithm guarantees balanced class distribution:

Group by class: Sample indices are grouped by their class label
Shuffle (if enabled): Indices within each class are randomly shuffled
Round-robin assignment: Each class’s samples are distributed across folds in round-robin order
- Fold 0 gets indices 0, k, 2k, …
- Fold 1 gets indices 1, k+1, 2k+1, …
Union: Each split returns k-1 folds as training data and 1 fold as validation data

For a class with m samples, each fold receives either ⌊m/k⌋ or ⌈m/k⌉ samples—the minimum possible imbalance.

Methods

split

std::vector<Split> split(const std::vector<Label>& labels) const;

Generates all k train/val index splits. Returns: Vector of k {train_indices, val_indices} pairs where:

Split is std::pair<Indices, Indices>
Indices is std::vector<std::size_t>

Indices refer to positions in the input labels vector.

n_classes

std::size_t n_classes() const noexcept;

Returns the number of unique classes found in the last call to split(). Only valid after calling split().

n_splits

std::size_t n_splits() const noexcept;

Returns the configured number of folds.

Example: Cross-validated evaluation

#include "Model Validation/stratified_kfold.hpp"
#include "Model Validation/confusion_matrix.hpp"
#include "Model Validation/metrics.h"
#include <numeric>

using namespace mlpp::model_validation;

// Your dataset
std::vector<int> labels = /* ... */;
std::vector<std::vector<double>> features = /* ... */;

// 5-fold cross-validation
StratifiedKFold<int> skf(5, true, 42);
auto splits = skf.split(labels);

std::vector<double> fold_f1_scores;

for (const auto& [train_idx, val_idx] : splits) {
    // Train model on train_idx
    // auto model = train(features, labels, train_idx);
    
    // Evaluate on val_idx
    ConfusionMatrix<std::size_t, int> cm(3);
    for (size_t i : val_idx) {
        // int pred = model.predict(features[i]);
        // cm.update(labels[i], pred);
    }
    
    Metrics metrics(cm);
    fold_f1_scores.push_back(metrics.macro_f1());
}

// Compute mean and std of F1 across folds
double mean_f1 = std::accumulate(fold_f1_scores.begin(), 
                                 fold_f1_scores.end(), 0.0) / fold_f1_scores.size();

Stratified k-fold guarantees that each class appears in every fold proportional to its frequency in the full dataset. This is essential for reliable evaluation on imbalanced data.

ROC curves

The ROCCurve class computes receiver operating characteristic curves and AUC (area under curve) for binary classifiers. Unlike hard predictions, ROC analysis uses raw classifier scores and is threshold-agnostic.

Basic usage

#include "Model Validation/roc_curve.hpp"

using namespace mlpp::model_validation;

// Classifier scores (higher = more likely positive)
std::vector<double> scores = {0.9, 0.8, 0.4, 0.6, 0.3, 0.7};

// Ground truth binary labels
std::vector<int> labels = {1, 1, 0, 1, 0, 0};

// Compute ROC curve (positive class = 1)
ROCCurve<double, int> roc(scores, labels, 1);

std::cout << "AUC: " << roc.auc() << "\n";
std::cout << "Optimal threshold: " << roc.optimal_threshold() << "\n";

// Access curve points
for (const auto& pt : roc.curve()) {
    std::cout << "FPR=" << pt.fpr << " TPR=" << pt.tpr << "\n";
}

Constructor

explicit ROCCurve(const std::vector<Score>& scores,
                  const std::vector<Label>& labels,
                  Label                     pos_label = Label(1));

Parameters:

scores: Raw classifier output, length n_samples. Higher score = more likely positive.
labels: Ground-truth binary labels.
pos_label: Value in labels that denotes the positive class (default: 1).

Curve construction

The ROC curve is built by:

Sorting samples by score in descending order
Sweeping the decision threshold from +∞ to -∞
Recording (FPR, TPR) at every unique score value

Tie handling: Samples with identical scores are processed as a batch before emitting a curve point. This matches scikit-learn’s convention and avoids jagged curves on discrete score distributions.

Methods

curve

const std::vector<Point>& curve() const noexcept;

Returns ordered curve points from (0,0) to (1,1), where each Point has:

struct Point {
    double fpr;  // False positive rate = FP / (FP + TN)
    double tpr;  // True positive rate = TP / (TP + FN)
};

auc

double auc() const noexcept;

Area under the ROC curve computed via the trapezoidal rule. Range [0, 1]. Interpretation: AUC equals the probability that the classifier ranks a random positive sample higher than a random negative sample (Wilcoxon-Mann-Whitney statistic).

AUC is preferred over accuracy or F1 for imbalanced evaluation because it is threshold-agnostic and unaffected by class distribution.

optimal_threshold

Score optimal_threshold() const noexcept;

Returns the score at the point on the curve closest to (FPR=0, TPR=1) by Euclidean distance. This maximizes the Youden J statistic: argmax_t (TPR(t) - FPR(t)).

n_pos / n_neg

std::size_t n_pos() const noexcept;
std::size_t n_neg() const noexcept;

Returns the number of positive and negative samples.

Multiclass ROC (one-vs-rest)

For multiclass problems, compute one ROC curve per class using one-vs-rest:

roc_ovr

static std::vector<ROCCurve> roc_ovr(
    const std::vector<std::vector<Score>>& scores,
    const std::vector<Label>&              labels,
    std::size_t                            n_classes);

Parameters:

scores: Score matrix, shape (n_samples, n_classes). scores[i][k] is the confidence that sample i belongs to class k.
labels: Integer class labels, length n_samples.
n_classes: Number of classes K.

Returns: Vector of K ROCCurve objects, one per class.

macro_auc

static double macro_auc(
    const std::vector<std::vector<Score>>& scores,
    const std::vector<Label>&              labels,
    std::size_t                            n_classes);

Computes macro-average AUC: unweighted mean of per-class AUCs from one-vs-rest.

Example: Multiclass ROC

#include "Model Validation/roc_curve.hpp"

using namespace mlpp::model_validation;

// Score matrix: scores[i][k] = P(sample i is class k)
std::vector<std::vector<double>> scores = {
    {0.8, 0.1, 0.1},  // Sample 0: likely class 0
    {0.2, 0.7, 0.1},  // Sample 1: likely class 1
    {0.1, 0.2, 0.7},  // Sample 2: likely class 2
    // ...
};

std::vector<int> labels = {0, 1, 2, /* ... */};

// Compute one-vs-rest ROC curves
auto roc_curves = ROCCurve<double, int>::roc_ovr(scores, labels, 3);

for (size_t k = 0; k < roc_curves.size(); ++k) {
    std::cout << "Class " << k << " AUC: " << roc_curves[k].auc() << "\n";
}

// Macro-average AUC
double macro_auc = ROCCurve<double, int>::macro_auc(scores, labels, 3);
std::cout << "Macro AUC: " << macro_auc << "\n";

The ROC curve implementation uses trapezoidal integration for AUC calculation. For continuous scores this is exact; for discrete scores it provides linear interpolation between breakpoints.

Getting Started

Core Concepts

Supervised Learning

Unsupervised Learning

Model Validation

Advanced Topics

Cross-validation

Stratified k-fold

Basic usage

Constructor

Splitting strategy

Methods

split

n_classes

n_splits

Example: Cross-validated evaluation

ROC curves

Basic usage

Constructor

Curve construction

Methods

curve

auc

optimal_threshold

n_pos / n_neg

Multiclass ROC (one-vs-rest)

roc_ovr

macro_auc

Example: Multiclass ROC

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Supervised Learning

Unsupervised Learning

Model Validation

Advanced Topics

​Stratified k-fold

​Basic usage

​Constructor

​Splitting strategy

​Methods

​split

​n_classes

​n_splits

​Example: Cross-validated evaluation

​ROC curves

​Basic usage

​Constructor

​Curve construction

​Methods

​curve

​auc

​optimal_threshold

​n_pos / n_neg

​Multiclass ROC (one-vs-rest)

​roc_ovr

​macro_auc

​Example: Multiclass ROC

Build docs developers (and LLMs) love

Stratified k-fold

Basic usage

Constructor

Splitting strategy

Methods

split

n_classes

n_splits

Example: Cross-validated evaluation

ROC curves

Basic usage

Constructor

Curve construction

Methods

curve

auc

optimal_threshold

n_pos / n_neg

Multiclass ROC (one-vs-rest)

roc_ovr

macro_auc

Example: Multiclass ROC