Kernel functions

Kernel functions enable SVMs to learn nonlinear decision boundaries by implicitly mapping data to high-dimensional feature spaces. MLPP provides a flexible kernel framework with built-in RKHS kernels, efficient caching, and type-safe composition.

Kernel interface

All kernels in MLPP inherit from the abstract Kernel base class and implement the function call operator:

namespace mlpp::classifiers::kernel {

class Kernel {
public:
    virtual ~Kernel() = default;

    [[nodiscard]]
    virtual double operator()(const Vector& x,
                              const Vector& y) const noexcept = 0;

    [[nodiscard]]
    virtual std::unique_ptr<Kernel> clone() const = 0;
};

}

The clone() method enables value semantics through type erasure, allowing kernels to be copied and stored polymorphically.

Type-erased kernel handle

The KernelFunction class provides a copyable, value-semantic wrapper around kernel implementations:

class KernelFunction {
public:
    explicit KernelFunction(std::unique_ptr<Kernel> impl);

    [[nodiscard]]
    double operator()(const Vector& x,
                      const Vector& y) const noexcept;

    [[nodiscard]]
    bool valid() const noexcept;
};

This design allows kernels to be passed by value while maintaining polymorphic behavior.

RKHS kernels

MLPP provides three standard reproducing kernel Hilbert space (RKHS) kernels:

Linear kernel

The simplest kernel corresponds to the identity feature map in ℝⁿ:

k(x, y) = ⟨x, y⟩

#include <MLPP/Supervised Learning/Classifiers/SVM/Kernel/rkhs_kernels.hpp>

using namespace mlpp::classifiers::kernel;

LinearKernel kernel;
double similarity = kernel(x, y);

The linear kernel is equivalent to standard linear SVM but expressed in the dual formulation.

Polynomial kernel

The polynomial kernel generates a finite-dimensional RKHS whose dimension grows combinatorially with the degree:

k(x, y) = (γ⟨x, y⟩ + c)^d

PolynomialKernel kernel(
    0.1,    // gamma: scaling factor
    1.0,    // coef0: bias term
    3       // degree: polynomial order
);

Parameters:

gamma (γ): Controls the influence of individual features
coef0 (c): Allows trading off higher-order vs lower-order terms
degree (d): Polynomial degree; higher values increase model capacity

RBF kernel (Gaussian)

The radial basis function kernel induces an infinite-dimensional, separable RKHS and is universally consistent under mild conditions:

k(x, y) = exp(-γ‖x - y‖²)

RBFKernel kernel(0.5);  // gamma parameter

Parameter selection:

Small γ: Smooth decision boundaries (high bias)
Large γ: Complex decision boundaries (high variance)
Rule of thumb: γ = 1 / (n_features × variance(X))

The RBF kernel can overfit when γ is too large. Always validate performance on held-out data.

Kernel caching

The KernelCache class precomputes and stores the Gram matrix K where K_ij = k(x_i, x_j):

namespace mlpp::classifiers::kernel {

class KernelCache {
public:
    KernelCache(const std::vector<Vector>& data,
                KernelFunction kernel);

    // Access single entry (lazy evaluation)
    [[nodiscard]]
    double operator()(std::size_t i, std::size_t j) const;

    // Access full Gram matrix (ensures all computed)
    [[nodiscard]]
    const Matrix& gram_matrix() const;

    // Force computation of all entries
    void precompute() const;

    [[nodiscard]]
    std::size_t size() const noexcept;
};

}

Usage example

std::vector<Vector> training_data = load_data();
RBFKernel rbf(0.5);

KernelCache cache(training_data, KernelFunction(
    std::make_unique<RBFKernel>(rbf)
));

// Lazy evaluation: computed on first access
double k_ij = cache(i, j);

// Precompute all entries for training
cache.precompute();

Symmetry exploitation

The cache exploits kernel symmetry (K_ij = K_ji) to avoid redundant computation:

// Both calls return the same stored value
double a = cache(5, 10);
double b = cache(10, 5);  // No recomputation

Performance considerations

Memory usage

The Gram matrix requires O(n²) storage where n is the number of training samples. For large datasets:

Consider using approximate kernel methods
Use sparse kernel representations for low-rank kernels
Apply sampling or mini-batch strategies

Computation

Kernel evaluation dominates SVM training time:

Linear kernel: O(d) per evaluation, d = feature dimension
Polynomial kernel: O(d) per evaluation
RBF kernel: O(d) per evaluation

Precomputing the full Gram matrix is O(n²d), beneficial when n is moderate and multiple optimization passes are needed.

Kernel composition

Kernels can be composed to create custom similarity measures. The kernel property is preserved under:

Addition: k(x, y) = k₁(x, y) + k₂(x, y)
Scaling: k(x, y) = α·k₁(x, y) for α > 0
Product: k(x, y) = k₁(x, y)·k₂(x, y)

class SumKernel : public Kernel {
    std::unique_ptr<Kernel> k1_, k2_;
public:
    SumKernel(std::unique_ptr<Kernel> k1,
              std::unique_ptr<Kernel> k2)
        : k1_(std::move(k1)), k2_(std::move(k2)) {}

    double operator()(const Vector& x,
                     const Vector& y) const noexcept override {
        return (*k1_)(x, y) + (*k2_)(x, y);
    }

    std::unique_ptr<Kernel> clone() const override {
        return std::make_unique<SumKernel>(
            k1_->clone(), k2_->clone()
        );
    }
};

Integration with SVM

Kernels are provided to the SVM constructor and managed through the kernel cache:

using namespace mlpp::classifiers::kernel;

std::vector<Vector> X_train = load_features();
Eigen::VectorXd y_train = load_labels();

SVM svm(
    X_train,
    y_train,
    KernelFunction(std::make_unique<RBFKernel>(0.5)),
    1.0  // C parameter
);

svm.fit();

During training, the SVM automatically uses kernel caching to accelerate the SMO optimization algorithm.

Getting Started

Core Concepts

Supervised Learning

Unsupervised Learning

Model Validation

Advanced Topics

Kernel functions

Kernel interface

Type-erased kernel handle

RKHS kernels

Linear kernel

Polynomial kernel

RBF kernel (Gaussian)

Kernel caching

Usage example

Symmetry exploitation

Performance considerations

Memory usage

Computation

Kernel composition

Integration with SVM

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Supervised Learning

Unsupervised Learning

Model Validation

Advanced Topics

​Kernel interface

​Type-erased kernel handle

​RKHS kernels

​Linear kernel

​Polynomial kernel

​RBF kernel (Gaussian)

​Kernel caching

​Usage example

​Symmetry exploitation

​Performance considerations

​Memory usage

​Computation

​Kernel composition

​Integration with SVM

Build docs developers (and LLMs) love

Kernel interface

Type-erased kernel handle

RKHS kernels

Linear kernel

Polynomial kernel

RBF kernel (Gaussian)

Kernel caching

Usage example

Symmetry exploitation

Performance considerations

Memory usage

Computation

Kernel composition

Integration with SVM