Classification

Classification algorithms predict discrete categorical labels from input features. MLPP provides implementations of classic discriminative and generative classifiers.

Support Vector Machine (SVM)

SVM finds the maximum-margin hyperplane separating two classes using kernel functions. MLPP implements the dual formulation with kernel caching for efficiency.

Dual formulation

SVM solves:

maximize   W(α) = Σ α_i − 1/2 Σ Σ α_i α_j y_i y_j K(x_i, x_j)

subject to:
  0 ≤ α_i ≤ C
  Σ α_i y_i = 0

The decision function is:

f(x) = Σ α_i y_i K(x_i, x) + b

Basic usage

#include <mlpp/classifiers/SVM/SVM.hpp>
#include <mlpp/classifiers/SVM/Kernel/rkhs_kernels.hpp>

using namespace mlpp::classifiers::kernel;

// Prepare data
std::vector<Vector> data;  // Training samples
Eigen::VectorXd labels;    // Class labels in {-1, +1}

// Create RBF kernel
auto kernel = kernels::rbf(1.0);  // gamma = 1.0

// Create SVM with C = 1.0
SVM svm(data, labels, kernel, 1.0);

// Train model
svm.fit();

// Predict new sample
Vector x_new;
int prediction = svm.predict(x_new);  // Returns -1 or +1

// Get decision function value
double score = svm.decision(x_new);

// Get support vector indices
auto sv_indices = svm.support_indices();

Constructor parameters

data

const std::vector<Vector>&

Training samples as a vector of Eigen vectors.

labels

LabelVector

Class labels as Eigen::VectorXd with values in for binary classification.

kernel

KernelFunction

Kernel function K(x, y). Common choices:

kernels::rbf(gamma) - Radial basis function
kernels::linear() - Linear kernel
kernels::polynomial(degree, coef0) - Polynomial kernel

double

Soft margin penalty parameter C > 0. Higher values enforce stricter margins but may overfit.

Methods

fit

void

void fit()

Train the SVM model using the optimization strategy (typically SMO algorithm).

predict

int

int predict(const Vector& x) const

Predict class label (+1 or -1) for a new sample.

decision

double

double decision(const Vector& x) const

Evaluate the decision function f(x) = Σ α_i y_i K(x_i, x) + b.Positive values indicate class +1, negative values indicate class -1.

support_indices

std::vector<std::size_t>

std::vector<std::size_t> support_indices(double eps = 1e-8) const

Return indices of support vectors (samples with α_i > eps).

Kernel selection

RBF kernel
Linear kernel
Polynomial kernel

Radial basis function (Gaussian) kernel:

K(x, y) = exp(-γ ||x - y||²)

auto kernel = kernels::rbf(1.0);  // gamma = 1.0
SVM svm(data, labels, kernel, 1.0);

Good for non-linear boundaries. Tune γ via cross-validation.

Simple dot product:

K(x, y) = xᵀy

auto kernel = kernels::linear();
SVM svm(data, labels, kernel, 1.0);

Fast and interpretable. Use for linearly separable data.

Polynomial kernel of degree d:

K(x, y) = (xᵀy + r)^d

auto kernel = kernels::polynomial(3, 1.0);  // degree=3, coef0=1.0
SVM svm(data, labels, kernel, 1.0);

Models polynomial relationships. Can be unstable for large d.

Logistic regression

Logistic regression models class probabilities using the sigmoid (logistic) function. MLPP provides both binary and multi-class variants.

Binary logistic regression

#include <mlpp/classifiers/logistic_regression.h>

using namespace mlpp::classifiers;

LogisticRegressionBinary<double> model;

// Fit to training data (labels must be 0 or 1)
model.fit(X_train, y_train,
          0.01,    // learning_rate
          1000,    // max_iter
          1e-6);   // tol

// Predict probabilities
auto probs = model.predict_proba(X_test);

// Predict class labels (threshold = 0.5)
auto y_pred = model.predict(X_test, 0.5);

// Get coefficients
auto theta = model.coefficients();
auto intercept = model.intercept();

The model estimates:

P(y = 1 | x; θ) = σ(θᵀ x̃)

where σ(z) = 1/(1 + exp(-z))

Multi-class logistic regression

Uses one-vs-rest strategy for K ≥ 2 classes:

LogisticRegressionMulti<double> model;

// Fit to multi-class data (labels are integer class indices)
model.fit(X_train, y_train,
          0.01,   // learning_rate
          1000,   // max_iter
          1e-6);  // tol

// Predict class probabilities (n_samples × n_classes)
auto probs = model.predict_proba(X_test);

// Predict class labels
auto y_pred = model.predict(X_test);

// Get coefficient matrix (n_classes × n_features+1)
auto thetas = model.coefficients();

Parameters

learning_rate

Scalar

default:"0.01"

Gradient descent step size. Smaller values are more stable but slower.

max_iter

std::size_t

default:"1000"

Maximum number of gradient descent iterations.

tol

Scalar

default:"1e-6"

Convergence tolerance. Training stops when ||Δθ||∞ < tol.

Methods (Binary)

fit

void

void fit(const Matrix& X, const Vector& y,
         Scalar learning_rate = 0.01,
         std::size_t max_iter = 1000,
         Scalar tol = 1e-6)

Fit binary logistic regression. Labels must be 0 or 1.

predict_proba

Vector

Vector predict_proba(const Matrix& X) const

Return P(y=1|x) for each sample.

predict

Vector

Vector predict(const Matrix& X, Scalar threshold = 0.5) const

Predict class labels using given probability threshold.

Linear Discriminant Analysis (LDA)

LDA is a generative classifier that models each class as a Gaussian distribution with shared covariance. It can also perform dimensionality reduction.

Basic usage

#include <mlpp/classifiers/LDA.h>

using namespace mlpp::classifiers;

LDA<double, int> lda;

// Fit to data
Eigen::MatrixXd X_train(100, 5);  // n_samples × n_features
Eigen::VectorXi y_train(100);      // Integer class labels

lda.fit(X_train, y_train, 2);  // Project to 2 components

// Transform to lower dimension
auto X_projected = lda.transform(X_train);

// Access learned parameters
auto projection = lda.projection_matrix();  // n_features × n_components
auto means = lda.mean_vectors();            // n_features × n_classes
int n_classes = lda.num_classes();

Methods

fit

void

void fit(const Matrix& X, const Labels& labels, int num_components = -1)

Fit LDA model. If num_components = -1, uses max(n_classes - 1).

transform

Matrix

Matrix transform(const Matrix& X) const

Project data to LDA subspace (n_samples × n_components).

compute_projection_matrix

void

void compute_projection_matrix(int num_components = -1)

Recompute projection matrix with different number of components.

How LDA works

Compute class means μ_c
Compute within-class scatter matrix S_W
Compute between-class scatter matrix S_B
Find projection W that maximizes: Wᵀ S_B W / Wᵀ S_W W

The projection maximizes class separation while minimizing within-class variance.

Quadratic Discriminant Analysis (QDA)

QDA extends LDA by allowing each class to have its own covariance matrix, producing quadratic decision boundaries.

Basic usage

#include <mlpp/classifiers/QDA.h>

using namespace mlpp::classifiers;

QDA<double, int> qda;

// Fit to training data
qda.fit(X_train, y_train);

// Predict class labels
auto y_pred = qda.predict(X_test);

// Get log-likelihoods for each class
auto log_probs = qda.predict_log_likelihood(X_test);

// Access learned parameters
int n_classes = qda.num_classes();
auto means = qda.class_means();          // std::vector<Vector>
auto covs = qda.class_covariances();     // std::vector<Matrix>

Methods

fit

void

void fit(const Matrix& X, const Labels& labels)

Fit QDA model. Estimates mean μ_c and covariance Σ_c for each class.

predict

Labels

Labels predict(const Matrix& X) const

Predict class labels for new samples.

predict_log_likelihood

Matrix

Matrix predict_log_likelihood(const Matrix& X) const

Return log posterior probabilities: log p(c|x) for each class (n_samples × n_classes).

QDA vs LDA

Use LDA when:

Classes have similar covariance structure
Limited training data per class
Want dimensionality reduction
Need linear decision boundaries

Use QDA when:

Classes have different covariance structures
Sufficient training data per class
Need flexible, quadratic decision boundaries

Mathematical model

QDA models each class as:

P(x|c) ~ N(μ_c, Σ_c)

The log posterior is:

log p(c|x) = -1/2 [(x-μ_c)ᵀ Σ_c⁻¹ (x-μ_c) + log|Σ_c|] + log P(c) + const

The predicted class is: argmax_c log p(c|x)

Example: Comparing classifiers

#include <mlpp/classifiers/SVM/SVM.hpp>
#include <mlpp/classifiers/logistic_regression.h>
#include <mlpp/classifiers/LDA.h>
#include <mlpp/classifiers/QDA.h>
#include <iostream>

int main() {
    // Load data
    Eigen::MatrixXd X_train, X_test;
    Eigen::VectorXi y_train, y_test;
    // ... load data ...
    
    // Try SVM with RBF kernel
    std::vector<Vector> data_vec;  // Convert to vector format
    auto kernel = kernels::rbf(1.0);
    SVM svm(data_vec, labels_svm, kernel, 1.0);
    svm.fit();
    
    // Try logistic regression
    LogisticRegressionMulti<double> logreg;
    logreg.fit(X_train, y_train.cast<double>(), 0.01, 1000, 1e-6);
    
    // Try LDA
    LDA<double, int> lda;
    lda.fit(X_train, y_train);
    
    // Try QDA
    QDA<double, int> qda;
    qda.fit(X_train, y_train);
    
    // Compare predictions
    auto pred_logreg = logreg.predict(X_test);
    auto pred_lda = lda.transform(X_test);  // Further classification needed
    auto pred_qda = qda.predict(X_test);
    
    // Compute accuracies
    // ...
    
    return 0;
}

Choosing a classifier

Algorithm	Decision boundary	Training speed	Pros	Cons
Logistic Regression	Linear	Fast	Simple, interpretable, probabilistic	Limited to linear boundaries
SVM (linear)	Linear	Medium	Maximum margin, kernel trick available	Requires label encoding
SVM (RBF)	Non-linear	Slow	Flexible, powerful for non-linear data	Slow training, hyperparameter tuning
LDA	Linear	Fast	Dimensionality reduction, robust with shared covariance	Assumes Gaussian distributions
QDA	Quadratic	Fast	Flexible covariance, quadratic boundaries	Requires more data per class

Start with logistic regression or linear SVM for interpretability. Use RBF SVM or QDA when you need non-linear decision boundaries. Use LDA when you also need dimensionality reduction.

Getting Started

Core Concepts

Supervised Learning

Unsupervised Learning

Model Validation

Advanced Topics

Classification

Support Vector Machine (SVM)

Dual formulation

Basic usage

Constructor parameters

Methods

Kernel selection

Logistic regression

Binary logistic regression

Multi-class logistic regression

Parameters

Methods (Binary)

Linear Discriminant Analysis (LDA)

Basic usage

Methods

How LDA works

Quadratic Discriminant Analysis (QDA)

Basic usage

Methods

QDA vs LDA

Mathematical model

Example: Comparing classifiers

Choosing a classifier

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Supervised Learning

Unsupervised Learning

Model Validation

Advanced Topics

​Support Vector Machine (SVM)

​Dual formulation

​Basic usage

​Constructor parameters

​Methods

​Kernel selection

​Logistic regression

​Binary logistic regression

​Multi-class logistic regression

​Parameters

​Methods (Binary)

​Linear Discriminant Analysis (LDA)

​Basic usage

​Methods

​How LDA works

​Quadratic Discriminant Analysis (QDA)

​Basic usage

​Methods

​QDA vs LDA

​Mathematical model

​Example: Comparing classifiers

​Choosing a classifier

Build docs developers (and LLMs) love

Support Vector Machine (SVM)

Dual formulation

Basic usage

Constructor parameters

Methods

Kernel selection

Logistic regression

Binary logistic regression

Multi-class logistic regression

Parameters

Methods (Binary)

Linear Discriminant Analysis (LDA)

Basic usage

Methods

How LDA works

Quadratic Discriminant Analysis (QDA)

Basic usage

Methods

QDA vs LDA

Mathematical model

Example: Comparing classifiers

Choosing a classifier