Skip to main content
Supervised learning is a machine learning paradigm where models are trained on labeled data to learn the mapping between input features and target outputs. MLPP provides a comprehensive suite of supervised learning algorithms implemented in modern C++ with Eigen for efficient linear algebra.

Available algorithms

Regression

Linear, ridge, and polynomial regression for continuous target prediction

Classification

SVM, LDA, QDA, and logistic regression for categorical target prediction

Decision trees

Tree-based models for both classification and regression tasks

Core concepts

Training and prediction

All supervised learning models in MLPP follow a consistent API pattern:
// 1. Create model with hyperparameters
Model model(param1, param2);

// 2. Fit to training data
model.fit(X_train, y_train);

// 3. Make predictions
auto y_pred = model.predict(X_test);

// 4. Evaluate performance
auto score = model.score(X_test, y_test);

Feature matrices

MLPP uses Eigen for all matrix operations. Input features are typically represented as row-major matrices:
using Matrix = Eigen::Matrix<double, Eigen::Dynamic, Eigen::Dynamic, Eigen::RowMajor>;
using Vector = Eigen::Matrix<double, Eigen::Dynamic, 1>;

// X has shape (n_samples, n_features)
Matrix X(100, 5);  // 100 samples, 5 features

// y has length n_samples
Vector y(100);

Regularization

Many models support L2 regularization (ridge penalty) to prevent overfitting:
// Linear regression with λ = 0.1
LinearRegression model(true, 0.1);
Regularization adds a penalty term to the loss function:
L(w) = MSE(w) + (λ/2) ||w||²

Model selection

Always split your data into training and test sets to evaluate model performance on unseen data. Use cross-validation for hyperparameter tuning.
Choose algorithms based on your problem characteristics:
  • Linear relationships: Linear regression, logistic regression
  • Non-linear relationships: Polynomial regression, decision trees, kernel SVM
  • High-dimensional data: Ridge regression, LDA for dimensionality reduction
  • Complex decision boundaries: SVM with kernels, decision trees
  • Interpretability required: Linear models, decision trees

Performance considerations

Solver selection

Regression models automatically choose the best solver based on problem geometry:
  • Normal equations (Cholesky): Fast for n >> d, O(nd² + d³)
  • SVD decomposition: Stable for d >> n or ill-conditioned problems
  • Jacobi SVD: Maximally stable fallback for difficult problems

Feature preprocessing

MLPP models handle standardization automatically:
  • Features are standardized internally (zero mean, unit variance)
  • Coefficients are returned in the original feature space
  • No manual preprocessing required
// Model handles standardization internally
LinearRegression model;
model.fit(X_raw, y);  // No need to standardize X_raw first

Next steps

Explore specific algorithm families:

Build docs developers (and LLMs) love