Skip to main content
MLPP uses Eigen as its primary linear algebra backend, providing efficient matrix operations and numerical stability. The library defines consistent type aliases and data structures across all algorithms.

Eigen integration

When Eigen 3.4+ is detected, MLPP automatically integrates it for linear algebra operations:
if(MLPP_USE_EIGEN)
    find_package(Eigen3 3.4 QUIET NO_MODULE)
    if(Eigen3_FOUND)
        target_link_libraries(mlpp INTERFACE Eigen3::Eigen)
        target_compile_definitions(mlpp INTERFACE MLPP_HAS_EIGEN)
    endif()
endif()
The MLPP_HAS_EIGEN definition enables Eigen-based implementations throughout the codebase.

Matrix and vector types

MLPP algorithms use Eigen’s templated matrix and vector types with consistent aliases:
Learning/Regression/linear_regression.hpp
template <typename Scalar = double>
class LinearRegression {
public:
    using Matrix = Eigen::Matrix<Scalar, Eigen::Dynamic, Eigen::Dynamic, Eigen::RowMajor>;
    using Vector = Eigen::Matrix<Scalar, Eigen::Dynamic, 1>;
    using Index  = Eigen::Index;
    
    // ...
};

Type characteristics

Dynamic sizing

Matrices and vectors use Eigen::Dynamic for runtime-sized containers, enabling flexible data dimensions.

Row-major storage

Matrices use row-major layout by default, optimizing for row-wise iteration patterns common in ML algorithms.

Template parameters

Scalar type is templated (typically double), allowing users to choose precision vs. performance tradeoffs.

Eigen interop

Types are direct Eigen instantiations, ensuring zero-cost interoperability with Eigen-based code.

Common type patterns

MLPP establishes consistent naming conventions for data structures:

Feature matrices

Feature matrices are typically named X with shape (n_samples, n_features) in row-major format:
void fit(const Matrix& X, const Vector& y);
Each row represents a single sample, and each column represents a feature dimension.

Target vectors

Target values are column vectors named y with length n_samples:
Vector predict(const Matrix& X) const;

Coefficient vectors

Learned parameters are stored as column vectors with length matching feature dimensions:
Vector coef_;        // Coefficient vector in original feature space
Scalar intercept_;   // Bias term

Dataset abstractions

For structured learning tasks, MLPP provides high-level dataset abstractions:
Learning/Clustering/clustering_dataset.hpp
namespace mlpp::unsupervised::clustering {

template<Numeric T>
class Record {
public:
    explicit Record(std::shared_ptr<Schema<T>> schema_ptr);
    
    AttrValue<T>& labelValue();
    const AttrValue<T>& labelValue() const;
    
    std::size_t get_id() const;
    std::size_t get_label() const;
    
private:
    std::shared_ptr<Schema<T>> schema_;
    AttrValue<T> label_;
    AttrValue<T> id_;
    std::vector<AttrValue<T>> features_;
};

template<Numeric T>
class Dataset {
public:
    explicit Dataset(std::shared_ptr<Schema<T>> schema_ptr);
    
    std::size_t num_attr() const;
    const std::shared_ptr<Schema<T>>& schema() const;
    
    AttrValue<T>& operator()(std::size_t i, std::size_t j);
    const AttrValue<T>& operator()(std::size_t i, std::size_t j) const;
    
    bool is_numeric() const;
    bool is_categorical() const;
    
private:
    std::shared_ptr<Schema<T>> schema_;
    std::vector<std::shared_ptr<Record<T>>> records_;
};

}

Schema-based organization

The Dataset class provides:
  • Type safety: Schema defines attribute types and constraints
  • Labeled data: Built-in support for supervised learning with labels
  • Record abstraction: Individual samples with typed feature access
  • Flexibility: Supports both numeric and categorical attributes
The schema-based approach is particularly useful for clustering and classification tasks where feature types and metadata matter.

Numeric concepts

MLPP uses C++20 concepts to constrain template parameters:
template<Numeric T>
class Record { /* ... */ };
The Numeric concept ensures type parameters support arithmetic operations required for machine learning computations.

Memory management

MLPP follows modern C++ memory management practices:
  • Value semantics: Algorithms store data members by value when possible
  • Smart pointers: Shared ownership uses std::shared_ptr
  • Const correctness: Read-only operations marked const throughout
  • Move semantics: Large objects support efficient moves

Example: Linear regression storage

Learning/Regression/linear_regression.hpp
private:
    // Hyper-parameters
    bool        fit_intercept_;
    Scalar      lambda_;
    SolveMethod method_;
    
    // Learned parameters (original feature space)
    Vector  coef_;           // Coefficient vector
    Scalar  intercept_{};    // Bias term
    
    Vector  feature_mean_;   // For standardization
    Vector  feature_std_;
    Scalar  target_mean_{};
    
    bool   fitted_ = false;
    Scalar cond_number_ = Scalar(-1);

Performance considerations

Row-major matrices optimize cache locality for row-wise iteration, but column operations may be slower. Choose storage order based on your dominant access pattern.
Eigen provides:
  • Vectorization: SIMD optimizations for supported architectures
  • Lazy evaluation: Expression templates minimize temporary allocations
  • Block operations: Efficient sub-matrix views without copying

Integration with external data

MLPP’s Eigen-based types integrate seamlessly with:
  • NumPy arrays (via Python bindings)
  • OpenCV matrices (when MLPP_HAS_OPENCV is defined)
  • Raw C++ arrays (via Eigen::Map)
  • Standard library containers
Example mapping from raw pointer:
double* raw_data = /* ... */;
Eigen::Map<Matrix> X(raw_data, n_samples, n_features);

Build docs developers (and LLMs) love