Skip to main content
Kernel functions enable SVMs to learn nonlinear decision boundaries by implicitly mapping data to high-dimensional feature spaces. MLPP provides a flexible kernel framework with built-in RKHS kernels, efficient caching, and type-safe composition.

Kernel interface

All kernels in MLPP inherit from the abstract Kernel base class and implement the function call operator:
namespace mlpp::classifiers::kernel {

class Kernel {
public:
    virtual ~Kernel() = default;

    [[nodiscard]]
    virtual double operator()(const Vector& x,
                              const Vector& y) const noexcept = 0;

    [[nodiscard]]
    virtual std::unique_ptr<Kernel> clone() const = 0;
};

}
The clone() method enables value semantics through type erasure, allowing kernels to be copied and stored polymorphically.

Type-erased kernel handle

The KernelFunction class provides a copyable, value-semantic wrapper around kernel implementations:
class KernelFunction {
public:
    explicit KernelFunction(std::unique_ptr<Kernel> impl);

    [[nodiscard]]
    double operator()(const Vector& x,
                      const Vector& y) const noexcept;

    [[nodiscard]]
    bool valid() const noexcept;
};
This design allows kernels to be passed by value while maintaining polymorphic behavior.

RKHS kernels

MLPP provides three standard reproducing kernel Hilbert space (RKHS) kernels:

Linear kernel

The simplest kernel corresponds to the identity feature map in ℝⁿ:
k(x, y) = ⟨x, y⟩
#include <MLPP/Supervised Learning/Classifiers/SVM/Kernel/rkhs_kernels.hpp>

using namespace mlpp::classifiers::kernel;

LinearKernel kernel;
double similarity = kernel(x, y);
The linear kernel is equivalent to standard linear SVM but expressed in the dual formulation.

Polynomial kernel

The polynomial kernel generates a finite-dimensional RKHS whose dimension grows combinatorially with the degree:
k(x, y) = (γ⟨x, y⟩ + c)^d
PolynomialKernel kernel(
    0.1,    // gamma: scaling factor
    1.0,    // coef0: bias term
    3       // degree: polynomial order
);
Parameters:
  • gamma (γ): Controls the influence of individual features
  • coef0 (c): Allows trading off higher-order vs lower-order terms
  • degree (d): Polynomial degree; higher values increase model capacity

RBF kernel (Gaussian)

The radial basis function kernel induces an infinite-dimensional, separable RKHS and is universally consistent under mild conditions:
k(x, y) = exp(-γ‖x - y‖²)
RBFKernel kernel(0.5);  // gamma parameter
Parameter selection:
  • Small γ: Smooth decision boundaries (high bias)
  • Large γ: Complex decision boundaries (high variance)
  • Rule of thumb: γ = 1 / (n_features × variance(X))
The RBF kernel can overfit when γ is too large. Always validate performance on held-out data.

Kernel caching

The KernelCache class precomputes and stores the Gram matrix K where K_ij = k(x_i, x_j):
namespace mlpp::classifiers::kernel {

class KernelCache {
public:
    KernelCache(const std::vector<Vector>& data,
                KernelFunction kernel);

    // Access single entry (lazy evaluation)
    [[nodiscard]]
    double operator()(std::size_t i, std::size_t j) const;

    // Access full Gram matrix (ensures all computed)
    [[nodiscard]]
    const Matrix& gram_matrix() const;

    // Force computation of all entries
    void precompute() const;

    [[nodiscard]]
    std::size_t size() const noexcept;
};

}

Usage example

std::vector<Vector> training_data = load_data();
RBFKernel rbf(0.5);

KernelCache cache(training_data, KernelFunction(
    std::make_unique<RBFKernel>(rbf)
));

// Lazy evaluation: computed on first access
double k_ij = cache(i, j);

// Precompute all entries for training
cache.precompute();

Symmetry exploitation

The cache exploits kernel symmetry (K_ij = K_ji) to avoid redundant computation:
// Both calls return the same stored value
double a = cache(5, 10);
double b = cache(10, 5);  // No recomputation

Performance considerations

Memory usage

The Gram matrix requires O(n²) storage where n is the number of training samples. For large datasets:
  • Consider using approximate kernel methods
  • Use sparse kernel representations for low-rank kernels
  • Apply sampling or mini-batch strategies

Computation

Kernel evaluation dominates SVM training time:
  • Linear kernel: O(d) per evaluation, d = feature dimension
  • Polynomial kernel: O(d) per evaluation
  • RBF kernel: O(d) per evaluation
Precomputing the full Gram matrix is O(n²d), beneficial when n is moderate and multiple optimization passes are needed.

Kernel composition

Kernels can be composed to create custom similarity measures. The kernel property is preserved under:
  1. Addition: k(x, y) = k₁(x, y) + k₂(x, y)
  2. Scaling: k(x, y) = α·k₁(x, y) for α > 0
  3. Product: k(x, y) = k₁(x, y)·k₂(x, y)
class SumKernel : public Kernel {
    std::unique_ptr<Kernel> k1_, k2_;
public:
    SumKernel(std::unique_ptr<Kernel> k1,
              std::unique_ptr<Kernel> k2)
        : k1_(std::move(k1)), k2_(std::move(k2)) {}

    double operator()(const Vector& x,
                     const Vector& y) const noexcept override {
        return (*k1_)(x, y) + (*k2_)(x, y);
    }

    std::unique_ptr<Kernel> clone() const override {
        return std::make_unique<SumKernel>(
            k1_->clone(), k2_->clone()
        );
    }
};

Integration with SVM

Kernels are provided to the SVM constructor and managed through the kernel cache:
using namespace mlpp::classifiers::kernel;

std::vector<Vector> X_train = load_features();
Eigen::VectorXd y_train = load_labels();

SVM svm(
    X_train,
    y_train,
    KernelFunction(std::make_unique<RBFKernel>(0.5)),
    1.0  // C parameter
);

svm.fit();
During training, the SVM automatically uses kernel caching to accelerate the SMO optimization algorithm.

Build docs developers (and LLMs) love