Skip to main content
Classification algorithms predict discrete categorical labels from input features. MLPP provides implementations of classic discriminative and generative classifiers.

Support Vector Machine (SVM)

SVM finds the maximum-margin hyperplane separating two classes using kernel functions. MLPP implements the dual formulation with kernel caching for efficiency.

Dual formulation

SVM solves:
maximize   W(α) = Σ α_i − 1/2 Σ Σ α_i α_j y_i y_j K(x_i, x_j)

subject to:
  0 ≤ α_i ≤ C
  Σ α_i y_i = 0
The decision function is:
f(x) = Σ α_i y_i K(x_i, x) + b

Basic usage

#include <mlpp/classifiers/SVM/SVM.hpp>
#include <mlpp/classifiers/SVM/Kernel/rkhs_kernels.hpp>

using namespace mlpp::classifiers::kernel;

// Prepare data
std::vector<Vector> data;  // Training samples
Eigen::VectorXd labels;    // Class labels in {-1, +1}

// Create RBF kernel
auto kernel = kernels::rbf(1.0);  // gamma = 1.0

// Create SVM with C = 1.0
SVM svm(data, labels, kernel, 1.0);

// Train model
svm.fit();

// Predict new sample
Vector x_new;
int prediction = svm.predict(x_new);  // Returns -1 or +1

// Get decision function value
double score = svm.decision(x_new);

// Get support vector indices
auto sv_indices = svm.support_indices();

Constructor parameters

data
const std::vector<Vector>&
Training samples as a vector of Eigen vectors.
labels
LabelVector
Class labels as Eigen::VectorXd with values in for binary classification.
kernel
KernelFunction
Kernel function K(x, y). Common choices:
  • kernels::rbf(gamma) - Radial basis function
  • kernels::linear() - Linear kernel
  • kernels::polynomial(degree, coef0) - Polynomial kernel
C
double
Soft margin penalty parameter C > 0. Higher values enforce stricter margins but may overfit.

Methods

fit
void
void fit()
Train the SVM model using the optimization strategy (typically SMO algorithm).
predict
int
int predict(const Vector& x) const
Predict class label (+1 or -1) for a new sample.
decision
double
double decision(const Vector& x) const
Evaluate the decision function f(x) = Σ α_i y_i K(x_i, x) + b.Positive values indicate class +1, negative values indicate class -1.
support_indices
std::vector<std::size_t>
std::vector<std::size_t> support_indices(double eps = 1e-8) const
Return indices of support vectors (samples with α_i > eps).

Kernel selection

Radial basis function (Gaussian) kernel:
K(x, y) = exp(-γ ||x - y||²)
auto kernel = kernels::rbf(1.0);  // gamma = 1.0
SVM svm(data, labels, kernel, 1.0);
Good for non-linear boundaries. Tune γ via cross-validation.

Logistic regression

Logistic regression models class probabilities using the sigmoid (logistic) function. MLPP provides both binary and multi-class variants.

Binary logistic regression

#include <mlpp/classifiers/logistic_regression.h>

using namespace mlpp::classifiers;

LogisticRegressionBinary<double> model;

// Fit to training data (labels must be 0 or 1)
model.fit(X_train, y_train,
          0.01,    // learning_rate
          1000,    // max_iter
          1e-6);   // tol

// Predict probabilities
auto probs = model.predict_proba(X_test);

// Predict class labels (threshold = 0.5)
auto y_pred = model.predict(X_test, 0.5);

// Get coefficients
auto theta = model.coefficients();
auto intercept = model.intercept();
The model estimates:
P(y = 1 | x; θ) = σ(θᵀ x̃)

where σ(z) = 1/(1 + exp(-z))

Multi-class logistic regression

Uses one-vs-rest strategy for K ≥ 2 classes:
LogisticRegressionMulti<double> model;

// Fit to multi-class data (labels are integer class indices)
model.fit(X_train, y_train,
          0.01,   // learning_rate
          1000,   // max_iter
          1e-6);  // tol

// Predict class probabilities (n_samples × n_classes)
auto probs = model.predict_proba(X_test);

// Predict class labels
auto y_pred = model.predict(X_test);

// Get coefficient matrix (n_classes × n_features+1)
auto thetas = model.coefficients();

Parameters

learning_rate
Scalar
default:"0.01"
Gradient descent step size. Smaller values are more stable but slower.
max_iter
std::size_t
default:"1000"
Maximum number of gradient descent iterations.
tol
Scalar
default:"1e-6"
Convergence tolerance. Training stops when ||Δθ||∞ < tol.

Methods (Binary)

fit
void
void fit(const Matrix& X, const Vector& y,
         Scalar learning_rate = 0.01,
         std::size_t max_iter = 1000,
         Scalar tol = 1e-6)
Fit binary logistic regression. Labels must be 0 or 1.
predict_proba
Vector
Vector predict_proba(const Matrix& X) const
Return P(y=1|x) for each sample.
predict
Vector
Vector predict(const Matrix& X, Scalar threshold = 0.5) const
Predict class labels using given probability threshold.

Linear Discriminant Analysis (LDA)

LDA is a generative classifier that models each class as a Gaussian distribution with shared covariance. It can also perform dimensionality reduction.

Basic usage

#include <mlpp/classifiers/LDA.h>

using namespace mlpp::classifiers;

LDA<double, int> lda;

// Fit to data
Eigen::MatrixXd X_train(100, 5);  // n_samples × n_features
Eigen::VectorXi y_train(100);      // Integer class labels

lda.fit(X_train, y_train, 2);  // Project to 2 components

// Transform to lower dimension
auto X_projected = lda.transform(X_train);

// Access learned parameters
auto projection = lda.projection_matrix();  // n_features × n_components
auto means = lda.mean_vectors();            // n_features × n_classes
int n_classes = lda.num_classes();

Methods

fit
void
void fit(const Matrix& X, const Labels& labels, int num_components = -1)
Fit LDA model. If num_components = -1, uses max(n_classes - 1).
transform
Matrix
Matrix transform(const Matrix& X) const
Project data to LDA subspace (n_samples × n_components).
compute_projection_matrix
void
void compute_projection_matrix(int num_components = -1)
Recompute projection matrix with different number of components.

How LDA works

  1. Compute class means μ_c
  2. Compute within-class scatter matrix S_W
  3. Compute between-class scatter matrix S_B
  4. Find projection W that maximizes: Wᵀ S_B W / Wᵀ S_W W
The projection maximizes class separation while minimizing within-class variance.

Quadratic Discriminant Analysis (QDA)

QDA extends LDA by allowing each class to have its own covariance matrix, producing quadratic decision boundaries.

Basic usage

#include <mlpp/classifiers/QDA.h>

using namespace mlpp::classifiers;

QDA<double, int> qda;

// Fit to training data
qda.fit(X_train, y_train);

// Predict class labels
auto y_pred = qda.predict(X_test);

// Get log-likelihoods for each class
auto log_probs = qda.predict_log_likelihood(X_test);

// Access learned parameters
int n_classes = qda.num_classes();
auto means = qda.class_means();          // std::vector<Vector>
auto covs = qda.class_covariances();     // std::vector<Matrix>

Methods

fit
void
void fit(const Matrix& X, const Labels& labels)
Fit QDA model. Estimates mean μ_c and covariance Σ_c for each class.
predict
Labels
Labels predict(const Matrix& X) const
Predict class labels for new samples.
predict_log_likelihood
Matrix
Matrix predict_log_likelihood(const Matrix& X) const
Return log posterior probabilities: log p(c|x) for each class (n_samples × n_classes).

QDA vs LDA

Use LDA when:
  • Classes have similar covariance structure
  • Limited training data per class
  • Want dimensionality reduction
  • Need linear decision boundaries
Use QDA when:
  • Classes have different covariance structures
  • Sufficient training data per class
  • Need flexible, quadratic decision boundaries

Mathematical model

QDA models each class as:
P(x|c) ~ N(μ_c, Σ_c)
The log posterior is:
log p(c|x) = -1/2 [(x-μ_c)ᵀ Σ_c⁻¹ (x-μ_c) + log|Σ_c|] + log P(c) + const
The predicted class is: argmax_c log p(c|x)

Example: Comparing classifiers

#include <mlpp/classifiers/SVM/SVM.hpp>
#include <mlpp/classifiers/logistic_regression.h>
#include <mlpp/classifiers/LDA.h>
#include <mlpp/classifiers/QDA.h>
#include <iostream>

int main() {
    // Load data
    Eigen::MatrixXd X_train, X_test;
    Eigen::VectorXi y_train, y_test;
    // ... load data ...
    
    // Try SVM with RBF kernel
    std::vector<Vector> data_vec;  // Convert to vector format
    auto kernel = kernels::rbf(1.0);
    SVM svm(data_vec, labels_svm, kernel, 1.0);
    svm.fit();
    
    // Try logistic regression
    LogisticRegressionMulti<double> logreg;
    logreg.fit(X_train, y_train.cast<double>(), 0.01, 1000, 1e-6);
    
    // Try LDA
    LDA<double, int> lda;
    lda.fit(X_train, y_train);
    
    // Try QDA
    QDA<double, int> qda;
    qda.fit(X_train, y_train);
    
    // Compare predictions
    auto pred_logreg = logreg.predict(X_test);
    auto pred_lda = lda.transform(X_test);  // Further classification needed
    auto pred_qda = qda.predict(X_test);
    
    // Compute accuracies
    // ...
    
    return 0;
}

Choosing a classifier

AlgorithmDecision boundaryTraining speedProsCons
Logistic RegressionLinearFastSimple, interpretable, probabilisticLimited to linear boundaries
SVM (linear)LinearMediumMaximum margin, kernel trick availableRequires label encoding
SVM (RBF)Non-linearSlowFlexible, powerful for non-linear dataSlow training, hyperparameter tuning
LDALinearFastDimensionality reduction, robust with shared covarianceAssumes Gaussian distributions
QDAQuadraticFastFlexible covariance, quadratic boundariesRequires more data per class
Start with logistic regression or linear SVM for interpretability. Use RBF SVM or QDA when you need non-linear decision boundaries. Use LDA when you also need dimensionality reduction.

Build docs developers (and LLMs) love