Kernel functions enable SVMs to learn nonlinear decision boundaries by implicitly mapping data to high-dimensional feature spaces. MLPP provides a flexible kernel framework with built-in RKHS kernels, efficient caching, and type-safe composition.
Kernel interface
All kernels in MLPP inherit from the abstract Kernel base class and implement the function call operator:
namespace mlpp::classifiers::kernel {
class Kernel {
public:
virtual ~Kernel() = default;
[[nodiscard]]
virtual double operator()(const Vector& x,
const Vector& y) const noexcept = 0;
[[nodiscard]]
virtual std::unique_ptr<Kernel> clone() const = 0;
};
}
The clone() method enables value semantics through type erasure, allowing kernels to be copied and stored polymorphically.
Type-erased kernel handle
The KernelFunction class provides a copyable, value-semantic wrapper around kernel implementations:
class KernelFunction {
public:
explicit KernelFunction(std::unique_ptr<Kernel> impl);
[[nodiscard]]
double operator()(const Vector& x,
const Vector& y) const noexcept;
[[nodiscard]]
bool valid() const noexcept;
};
This design allows kernels to be passed by value while maintaining polymorphic behavior.
RKHS kernels
MLPP provides three standard reproducing kernel Hilbert space (RKHS) kernels:
Linear kernel
The simplest kernel corresponds to the identity feature map in ℝⁿ:
#include <MLPP/Supervised Learning/Classifiers/SVM/Kernel/rkhs_kernels.hpp>
using namespace mlpp::classifiers::kernel;
LinearKernel kernel;
double similarity = kernel(x, y);
The linear kernel is equivalent to standard linear SVM but expressed in the dual formulation.
Polynomial kernel
The polynomial kernel generates a finite-dimensional RKHS whose dimension grows combinatorially with the degree:
k(x, y) = (γ⟨x, y⟩ + c)^d
PolynomialKernel kernel(
0.1, // gamma: scaling factor
1.0, // coef0: bias term
3 // degree: polynomial order
);
Parameters:
gamma (γ): Controls the influence of individual features
coef0 (c): Allows trading off higher-order vs lower-order terms
degree (d): Polynomial degree; higher values increase model capacity
RBF kernel (Gaussian)
The radial basis function kernel induces an infinite-dimensional, separable RKHS and is universally consistent under mild conditions:
k(x, y) = exp(-γ‖x - y‖²)
RBFKernel kernel(0.5); // gamma parameter
Parameter selection:
- Small γ: Smooth decision boundaries (high bias)
- Large γ: Complex decision boundaries (high variance)
- Rule of thumb: γ = 1 / (n_features × variance(X))
The RBF kernel can overfit when γ is too large. Always validate performance on held-out data.
Kernel caching
The KernelCache class precomputes and stores the Gram matrix K where K_ij = k(x_i, x_j):
namespace mlpp::classifiers::kernel {
class KernelCache {
public:
KernelCache(const std::vector<Vector>& data,
KernelFunction kernel);
// Access single entry (lazy evaluation)
[[nodiscard]]
double operator()(std::size_t i, std::size_t j) const;
// Access full Gram matrix (ensures all computed)
[[nodiscard]]
const Matrix& gram_matrix() const;
// Force computation of all entries
void precompute() const;
[[nodiscard]]
std::size_t size() const noexcept;
};
}
Usage example
std::vector<Vector> training_data = load_data();
RBFKernel rbf(0.5);
KernelCache cache(training_data, KernelFunction(
std::make_unique<RBFKernel>(rbf)
));
// Lazy evaluation: computed on first access
double k_ij = cache(i, j);
// Precompute all entries for training
cache.precompute();
Symmetry exploitation
The cache exploits kernel symmetry (K_ij = K_ji) to avoid redundant computation:
// Both calls return the same stored value
double a = cache(5, 10);
double b = cache(10, 5); // No recomputation
Memory usage
The Gram matrix requires O(n²) storage where n is the number of training samples. For large datasets:
- Consider using approximate kernel methods
- Use sparse kernel representations for low-rank kernels
- Apply sampling or mini-batch strategies
Computation
Kernel evaluation dominates SVM training time:
- Linear kernel: O(d) per evaluation, d = feature dimension
- Polynomial kernel: O(d) per evaluation
- RBF kernel: O(d) per evaluation
Precomputing the full Gram matrix is O(n²d), beneficial when n is moderate and multiple optimization passes are needed.
Kernel composition
Kernels can be composed to create custom similarity measures. The kernel property is preserved under:
- Addition: k(x, y) = k₁(x, y) + k₂(x, y)
- Scaling: k(x, y) = α·k₁(x, y) for α > 0
- Product: k(x, y) = k₁(x, y)·k₂(x, y)
class SumKernel : public Kernel {
std::unique_ptr<Kernel> k1_, k2_;
public:
SumKernel(std::unique_ptr<Kernel> k1,
std::unique_ptr<Kernel> k2)
: k1_(std::move(k1)), k2_(std::move(k2)) {}
double operator()(const Vector& x,
const Vector& y) const noexcept override {
return (*k1_)(x, y) + (*k2_)(x, y);
}
std::unique_ptr<Kernel> clone() const override {
return std::make_unique<SumKernel>(
k1_->clone(), k2_->clone()
);
}
};
Integration with SVM
Kernels are provided to the SVM constructor and managed through the kernel cache:
using namespace mlpp::classifiers::kernel;
std::vector<Vector> X_train = load_features();
Eigen::VectorXd y_train = load_labels();
SVM svm(
X_train,
y_train,
KernelFunction(std::make_unique<RBFKernel>(0.5)),
1.0 // C parameter
);
svm.fit();
During training, the SVM automatically uses kernel caching to accelerate the SMO optimization algorithm.