Classification algorithms predict discrete categorical labels from input features. MLPP provides implementations of classic discriminative and generative classifiers.
Support Vector Machine (SVM)
SVM finds the maximum-margin hyperplane separating two classes using kernel functions. MLPP implements the dual formulation with kernel caching for efficiency.
SVM solves:
maximize W(α) = Σ α_i − 1/2 Σ Σ α_i α_j y_i y_j K(x_i, x_j)
subject to:
0 ≤ α_i ≤ C
Σ α_i y_i = 0
The decision function is:
f(x) = Σ α_i y_i K(x_i, x) + b
Basic usage
#include <mlpp/classifiers/SVM/SVM.hpp>
#include <mlpp/classifiers/SVM/Kernel/rkhs_kernels.hpp>
using namespace mlpp::classifiers::kernel;
// Prepare data
std::vector<Vector> data; // Training samples
Eigen::VectorXd labels; // Class labels in {-1, +1}
// Create RBF kernel
auto kernel = kernels::rbf(1.0); // gamma = 1.0
// Create SVM with C = 1.0
SVM svm(data, labels, kernel, 1.0);
// Train model
svm.fit();
// Predict new sample
Vector x_new;
int prediction = svm.predict(x_new); // Returns -1 or +1
// Get decision function value
double score = svm.decision(x_new);
// Get support vector indices
auto sv_indices = svm.support_indices();
Constructor parameters
data
const std::vector<Vector>&
Training samples as a vector of Eigen vectors.
Class labels as Eigen::VectorXd with values in for binary classification.
Kernel function K(x, y). Common choices:
kernels::rbf(gamma) - Radial basis function
kernels::linear() - Linear kernel
kernels::polynomial(degree, coef0) - Polynomial kernel
Soft margin penalty parameter C > 0. Higher values enforce stricter margins but may overfit.
Methods
Train the SVM model using the optimization strategy (typically SMO algorithm).
int predict(const Vector& x) const
Predict class label (+1 or -1) for a new sample.
double decision(const Vector& x) const
Evaluate the decision function f(x) = Σ α_i y_i K(x_i, x) + b.Positive values indicate class +1, negative values indicate class -1.
std::vector<std::size_t> support_indices(double eps = 1e-8) const
Return indices of support vectors (samples with α_i > eps).
Kernel selection
RBF kernel
Linear kernel
Polynomial kernel
Radial basis function (Gaussian) kernel:K(x, y) = exp(-γ ||x - y||²)
auto kernel = kernels::rbf(1.0); // gamma = 1.0
SVM svm(data, labels, kernel, 1.0);
Good for non-linear boundaries. Tune γ via cross-validation. Simple dot product:auto kernel = kernels::linear();
SVM svm(data, labels, kernel, 1.0);
Fast and interpretable. Use for linearly separable data. Polynomial kernel of degree d:auto kernel = kernels::polynomial(3, 1.0); // degree=3, coef0=1.0
SVM svm(data, labels, kernel, 1.0);
Models polynomial relationships. Can be unstable for large d.
Logistic regression
Logistic regression models class probabilities using the sigmoid (logistic) function. MLPP provides both binary and multi-class variants.
Binary logistic regression
#include <mlpp/classifiers/logistic_regression.h>
using namespace mlpp::classifiers;
LogisticRegressionBinary<double> model;
// Fit to training data (labels must be 0 or 1)
model.fit(X_train, y_train,
0.01, // learning_rate
1000, // max_iter
1e-6); // tol
// Predict probabilities
auto probs = model.predict_proba(X_test);
// Predict class labels (threshold = 0.5)
auto y_pred = model.predict(X_test, 0.5);
// Get coefficients
auto theta = model.coefficients();
auto intercept = model.intercept();
The model estimates:
P(y = 1 | x; θ) = σ(θᵀ x̃)
where σ(z) = 1/(1 + exp(-z))
Multi-class logistic regression
Uses one-vs-rest strategy for K ≥ 2 classes:
LogisticRegressionMulti<double> model;
// Fit to multi-class data (labels are integer class indices)
model.fit(X_train, y_train,
0.01, // learning_rate
1000, // max_iter
1e-6); // tol
// Predict class probabilities (n_samples × n_classes)
auto probs = model.predict_proba(X_test);
// Predict class labels
auto y_pred = model.predict(X_test);
// Get coefficient matrix (n_classes × n_features+1)
auto thetas = model.coefficients();
Parameters
Gradient descent step size. Smaller values are more stable but slower.
max_iter
std::size_t
default:"1000"
Maximum number of gradient descent iterations.
Convergence tolerance. Training stops when ||Δθ||∞ < tol.
Methods (Binary)
void fit(const Matrix& X, const Vector& y,
Scalar learning_rate = 0.01,
std::size_t max_iter = 1000,
Scalar tol = 1e-6)
Fit binary logistic regression. Labels must be 0 or 1.
Vector predict_proba(const Matrix& X) const
Return P(y=1|x) for each sample.
Vector predict(const Matrix& X, Scalar threshold = 0.5) const
Predict class labels using given probability threshold.
Linear Discriminant Analysis (LDA)
LDA is a generative classifier that models each class as a Gaussian distribution with shared covariance. It can also perform dimensionality reduction.
Basic usage
#include <mlpp/classifiers/LDA.h>
using namespace mlpp::classifiers;
LDA<double, int> lda;
// Fit to data
Eigen::MatrixXd X_train(100, 5); // n_samples × n_features
Eigen::VectorXi y_train(100); // Integer class labels
lda.fit(X_train, y_train, 2); // Project to 2 components
// Transform to lower dimension
auto X_projected = lda.transform(X_train);
// Access learned parameters
auto projection = lda.projection_matrix(); // n_features × n_components
auto means = lda.mean_vectors(); // n_features × n_classes
int n_classes = lda.num_classes();
Methods
void fit(const Matrix& X, const Labels& labels, int num_components = -1)
Fit LDA model. If num_components = -1, uses max(n_classes - 1).
Matrix transform(const Matrix& X) const
Project data to LDA subspace (n_samples × n_components).
compute_projection_matrix
void compute_projection_matrix(int num_components = -1)
Recompute projection matrix with different number of components.
How LDA works
- Compute class means μ_c
- Compute within-class scatter matrix S_W
- Compute between-class scatter matrix S_B
- Find projection W that maximizes: Wᵀ S_B W / Wᵀ S_W W
The projection maximizes class separation while minimizing within-class variance.
Quadratic Discriminant Analysis (QDA)
QDA extends LDA by allowing each class to have its own covariance matrix, producing quadratic decision boundaries.
Basic usage
#include <mlpp/classifiers/QDA.h>
using namespace mlpp::classifiers;
QDA<double, int> qda;
// Fit to training data
qda.fit(X_train, y_train);
// Predict class labels
auto y_pred = qda.predict(X_test);
// Get log-likelihoods for each class
auto log_probs = qda.predict_log_likelihood(X_test);
// Access learned parameters
int n_classes = qda.num_classes();
auto means = qda.class_means(); // std::vector<Vector>
auto covs = qda.class_covariances(); // std::vector<Matrix>
Methods
void fit(const Matrix& X, const Labels& labels)
Fit QDA model. Estimates mean μ_c and covariance Σ_c for each class.
Labels predict(const Matrix& X) const
Predict class labels for new samples.
Matrix predict_log_likelihood(const Matrix& X) const
Return log posterior probabilities: log p(c|x) for each class (n_samples × n_classes).
QDA vs LDA
Use LDA when:
- Classes have similar covariance structure
- Limited training data per class
- Want dimensionality reduction
- Need linear decision boundaries
Use QDA when:
- Classes have different covariance structures
- Sufficient training data per class
- Need flexible, quadratic decision boundaries
Mathematical model
QDA models each class as:
The log posterior is:
log p(c|x) = -1/2 [(x-μ_c)ᵀ Σ_c⁻¹ (x-μ_c) + log|Σ_c|] + log P(c) + const
The predicted class is: argmax_c log p(c|x)
Example: Comparing classifiers
#include <mlpp/classifiers/SVM/SVM.hpp>
#include <mlpp/classifiers/logistic_regression.h>
#include <mlpp/classifiers/LDA.h>
#include <mlpp/classifiers/QDA.h>
#include <iostream>
int main() {
// Load data
Eigen::MatrixXd X_train, X_test;
Eigen::VectorXi y_train, y_test;
// ... load data ...
// Try SVM with RBF kernel
std::vector<Vector> data_vec; // Convert to vector format
auto kernel = kernels::rbf(1.0);
SVM svm(data_vec, labels_svm, kernel, 1.0);
svm.fit();
// Try logistic regression
LogisticRegressionMulti<double> logreg;
logreg.fit(X_train, y_train.cast<double>(), 0.01, 1000, 1e-6);
// Try LDA
LDA<double, int> lda;
lda.fit(X_train, y_train);
// Try QDA
QDA<double, int> qda;
qda.fit(X_train, y_train);
// Compare predictions
auto pred_logreg = logreg.predict(X_test);
auto pred_lda = lda.transform(X_test); // Further classification needed
auto pred_qda = qda.predict(X_test);
// Compute accuracies
// ...
return 0;
}
Choosing a classifier
| Algorithm | Decision boundary | Training speed | Pros | Cons |
|---|
| Logistic Regression | Linear | Fast | Simple, interpretable, probabilistic | Limited to linear boundaries |
| SVM (linear) | Linear | Medium | Maximum margin, kernel trick available | Requires label encoding |
| SVM (RBF) | Non-linear | Slow | Flexible, powerful for non-linear data | Slow training, hyperparameter tuning |
| LDA | Linear | Fast | Dimensionality reduction, robust with shared covariance | Assumes Gaussian distributions |
| QDA | Quadratic | Fast | Flexible covariance, quadratic boundaries | Requires more data per class |
Start with logistic regression or linear SVM for interpretability. Use RBF SVM or QDA when you need non-linear decision boundaries. Use LDA when you also need dimensionality reduction.