Eigen integration
When Eigen 3.4+ is detected, MLPP automatically integrates it for linear algebra operations:MLPP_HAS_EIGEN definition enables Eigen-based implementations throughout the codebase.
Matrix and vector types
MLPP algorithms use Eigen’s templated matrix and vector types with consistent aliases:Learning/Regression/linear_regression.hpp
Type characteristics
Dynamic sizing
Matrices and vectors use
Eigen::Dynamic for runtime-sized containers, enabling flexible data dimensions.Row-major storage
Matrices use row-major layout by default, optimizing for row-wise iteration patterns common in ML algorithms.
Template parameters
Scalar type is templated (typically
double), allowing users to choose precision vs. performance tradeoffs.Eigen interop
Types are direct Eigen instantiations, ensuring zero-cost interoperability with Eigen-based code.
Common type patterns
MLPP establishes consistent naming conventions for data structures:Feature matrices
Feature matrices are typically namedX with shape (n_samples, n_features) in row-major format:
Target vectors
Target values are column vectors namedy with length n_samples:
Coefficient vectors
Learned parameters are stored as column vectors with length matching feature dimensions:Dataset abstractions
For structured learning tasks, MLPP provides high-level dataset abstractions:Learning/Clustering/clustering_dataset.hpp
Schema-based organization
TheDataset class provides:
- Type safety: Schema defines attribute types and constraints
- Labeled data: Built-in support for supervised learning with labels
- Record abstraction: Individual samples with typed feature access
- Flexibility: Supports both numeric and categorical attributes
The schema-based approach is particularly useful for clustering and classification tasks where feature types and metadata matter.
Numeric concepts
MLPP uses C++20 concepts to constrain template parameters:Numeric concept ensures type parameters support arithmetic operations required for machine learning computations.
Memory management
MLPP follows modern C++ memory management practices:- Value semantics: Algorithms store data members by value when possible
- Smart pointers: Shared ownership uses
std::shared_ptr - Const correctness: Read-only operations marked
constthroughout - Move semantics: Large objects support efficient moves
Example: Linear regression storage
Learning/Regression/linear_regression.hpp
Performance considerations
Eigen provides:- Vectorization: SIMD optimizations for supported architectures
- Lazy evaluation: Expression templates minimize temporary allocations
- Block operations: Efficient sub-matrix views without copying
Integration with external data
MLPP’s Eigen-based types integrate seamlessly with:- NumPy arrays (via Python bindings)
- OpenCV matrices (when
MLPP_HAS_OPENCVis defined) - Raw C++ arrays (via Eigen::Map)
- Standard library containers