Metrics

The Metrics class provides a comprehensive set of classification metrics derived from a confusion matrix. It supports both per-class metrics and aggregated macro/micro averages.

Basic usage

#include "Model Validation/confusion_matrix.hpp"
#include "Model Validation/metrics.h"

using namespace mlpp::model_validation;

// Build confusion matrix
ConfusionMatrix<std::size_t> cm(3);
cm.update(0, 0);
cm.update(0, 1);
cm.update(1, 1);
cm.update(2, 2);

// Compute metrics
Metrics metrics(cm);

// Per-class metrics
double p0 = metrics.precision(0);
double r0 = metrics.recall(0);
double f1_0 = metrics.f1(0);

// Aggregated metrics
double macro_f1 = metrics.macro_f1();
double micro_f1 = metrics.micro_f1();

Per-class metrics

All per-class metrics accept a class index k (zero-based).

Precision

double precision(std::size_t k) const noexcept;

Precision measures what fraction of predicted positives are actually positive:

Precision = TP / (TP + FP)

High precision means few false positives.

Recall

double recall(std::size_t k) const noexcept;

Recall (sensitivity, true positive rate) measures what fraction of actual positives are correctly identified:

Recall = TP / (TP + FN)

High recall means few false negatives.

F1 score

double f1(std::size_t k) const noexcept;

F1 score is the harmonic mean of precision and recall:

F1 = 2 * (Precision * Recall) / (Precision + Recall)

F1 balances both precision and recall, reaching its best value at 1.0 and worst at 0.0.

F1 score is more informative than accuracy for imbalanced datasets because it accounts for both false positives and false negatives.

IoU (Intersection over Union)

double iou(std::size_t k) const noexcept;

IoU measures overlap between predicted and actual positives:

IoU = TP / (TP + FP + FN)

Commonly used in segmentation tasks and object detection.

Macro averages

Macro averages compute the metric for each class independently, then take the unweighted mean. This treats all classes equally regardless of support.

Macro precision

double macro_precision() const noexcept;

Unweighted mean of per-class precision values.

Macro recall

double macro_recall() const noexcept;

Unweighted mean of per-class recall values.

Macro F1

double macro_f1() const noexcept;

Unweighted mean of per-class F1 scores.

Mean IoU

double mean_iou() const noexcept;

Unweighted mean of per-class IoU values.

Macro averaging gives equal weight to all classes, making it sensitive to performance on rare classes. Use this when all classes are equally important.

Micro averages

Micro averages aggregate counts across all classes first, then compute the metric. This weights classes by their support (number of samples).

Micro precision

double micro_precision() const noexcept;

Computes precision from global TP and FP counts:

Micro Precision = Σ(TP_k) / (Σ(TP_k) + Σ(FP_k))

Micro recall

double micro_recall() const noexcept;

Computes recall from global TP and FN counts:

Micro Recall = Σ(TP_k) / (Σ(TP_k) + Σ(FN_k))

Micro F1

double micro_f1() const noexcept;

Harmonic mean of micro precision and micro recall.

For multi-class problems, micro precision equals micro recall (both equal accuracy). Micro averaging weights classes by support, so it emphasizes performance on frequent classes.

Basic counts

The Metrics class also exposes raw count accessors:

T tp(std::size_t k) const noexcept;  // True positives for class k
T fp(std::size_t k) const noexcept;  // False positives for class k
T fn(std::size_t k) const noexcept;  // False negatives for class k

These are computed from the confusion matrix:

tp(k): Diagonal element cm[k][k]
fp(k): Sum of column k excluding diagonal
fn(k): Sum of row k excluding diagonal

Template requirements

The Metrics class is templated on the confusion matrix type:

template<typename CM>
class Metrics {
    using T = typename std::remove_cvref_t<CM>::value_type;
    // ...
};

This works with any ConfusionMatrix<T, Label> instantiation, automatically deducing the count type T.

Example: Multi-class evaluation

#include "Model Validation/confusion_matrix.hpp"
#include "Model Validation/metrics.h"
#include <iostream>

using namespace mlpp::model_validation;

int main() {
    // Simulate predictions for 3-class problem
    std::vector<int> y_true = {0, 0, 1, 1, 2, 2};
    std::vector<int> y_pred = {0, 1, 1, 1, 2, 0};
    
    ConfusionMatrix<std::size_t, int> cm(3);
    for (size_t i = 0; i < y_true.size(); ++i) {
        cm.update(y_true[i], y_pred[i]);
    }
    
    Metrics metrics(cm);
    
    std::cout << "Per-class metrics:\n";
    for (size_t k = 0; k < 3; ++k) {
        std::cout << "Class " << k << ": "
                  << "P=" << metrics.precision(k) << " "
                  << "R=" << metrics.recall(k) << " "
                  << "F1=" << metrics.f1(k) << "\n";
    }
    
    std::cout << "\nAggregated metrics:\n";
    std::cout << "Macro F1: " << metrics.macro_f1() << "\n";
    std::cout << "Micro F1: " << metrics.micro_f1() << "\n";
    
    return 0;
}

Getting Started

Core Concepts

Supervised Learning

Unsupervised Learning

Model Validation

Advanced Topics

Basic usage

Per-class metrics

Precision

Recall

F1 score

IoU (Intersection over Union)

Macro averages

Macro precision

Macro recall

Macro F1

Mean IoU

Micro averages

Micro precision

Micro recall

Micro F1

Basic counts

Template requirements

Example: Multi-class evaluation

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Supervised Learning

Unsupervised Learning

Model Validation

Advanced Topics

​Basic usage

​Per-class metrics

​Precision

​Recall

​F1 score

​IoU (Intersection over Union)

​Macro averages

​Macro precision

​Macro recall

​Macro F1

​Mean IoU

​Micro averages

​Micro precision

​Micro recall

​Micro F1

​Basic counts

​Template requirements

​Example: Multi-class evaluation

Build docs developers (and LLMs) love

Basic usage

Per-class metrics

Precision

Recall

F1 score

IoU (Intersection over Union)

Macro averages

Macro precision

Macro recall

Macro F1

Mean IoU

Micro averages

Micro precision

Micro recall

Micro F1

Basic counts

Template requirements

Example: Multi-class evaluation