Kernel Perceptron

Overview

The KernelPerceptron class implements the dual-form kernelized perceptron algorithm for binary classification. It uses kernel functions to implicitly map data to high-dimensional feature spaces without explicit computation. Namespace: mlpp::classifiers Decision function:

f(x) = sign(Σᵢ αᵢ yᵢ k(xᵢ, x))

where:

αᵢ are the dual coefficients (mistake counters)
yᵢ are the training labels
k(·,·) is the kernel function

Constructor

KernelPerceptron

Construct a kernelized perceptron classifier.

KernelPerceptron(const std::vector<kernel::Vector>& X,
                 const std::vector<int>& y,
                 kernel::KernelFunction kernel,
                 std::size_t max_epochs = 100);

const std::vector<kernel::Vector>&

Training feature vectors. Each element is an Eigen::VectorXd

const std::vector<int>&

Training labels. Must contain values in

kernel

kernel::KernelFunction

Kernel function with signature double(const Vector&, const Vector&)

max_epochs

std::size_t

default:"100"

Maximum number of passes through the training data

#include "SVM/Kernel Perceptron/kernel_perceptron.hpp"
#include <Eigen/Dense>
#include <vector>

using namespace mlpp::classifiers;
using Vector = Eigen::VectorXd;

// Prepare data
std::vector<Vector> X_train;
std::vector<int> y_train;

Vector x1(2); x1 << 1.0, 2.0;
Vector x2(2); x2 << 2.0, 3.0;
Vector x3(2); x3 << 5.0, 6.0;
Vector x4(2); x4 << 6.0, 7.0;

X_train = {x1, x2, x3, x4};
y_train = {-1, -1, 1, 1};

// Define polynomial kernel
auto poly_kernel = [](const Vector& a, const Vector& b) {
    double degree = 2.0;
    return std::pow(a.dot(b) + 1.0, degree);
};

// Create classifier
KernelPerceptron kp(X_train, y_train, poly_kernel, 50);

Methods

fit

Train the kernel perceptron model.

void fit();

Runs the perceptron learning algorithm for up to max_epochs passes through the training data. Updates dual coefficients (alphas) when misclassifications occur. The algorithm:

For each training example (xᵢ, yᵢ)
If yᵢ · f(xᵢ) ≤ 0 (misclassification), increment αᵢ
Repeat until convergence or max_epochs reached

kp.fit();
std::cout << "Training complete. Mistakes: " << kp.mistakes() << std::endl;

predict

Predict the class label for a new sample.

[[nodiscard]]
int predict(const kernel::Vector& x) const;

const kernel::Vector&

Input feature vector (Eigen::VectorXd) to classify

return

int

Predicted class label: +1 or -1

Computes the decision function and returns its sign:

predict(x) = sign(Σᵢ αᵢ yᵢ k(xᵢ, x))

Vector test_point(2);
test_point << 3.5, 4.0;

int prediction = kp.predict(test_point);
std::cout << "Predicted class: " << prediction << std::endl;  // +1 or -1

mistakes

Get the total number of mistakes made during training.

[[nodiscard]]
std::size_t mistakes() const noexcept;

return

std::size_t

Total count of misclassifications encountered during training

This count indicates how many times the perceptron made an incorrect prediction during the training process. Fewer mistakes generally indicate better convergence.

std::cout << "Training errors: " << kp.mistakes() << std::endl;

Common kernel functions

The perceptron works with any valid kernel function. Here are common choices:

Linear kernel

auto linear_kernel = [](const Vector& a, const Vector& b) {
    return a.dot(b);
};

Polynomial kernel

auto polynomial_kernel = [](const Vector& a, const Vector& b) {
    double degree = 3.0;
    double coef0 = 1.0;
    return std::pow(a.dot(b) + coef0, degree);
};

RBF (Gaussian) kernel

auto rbf_kernel = [](const Vector& a, const Vector& b) {
    double gamma = 0.5;
    return std::exp(-gamma * (a - b).squaredNorm());
};

Sigmoid kernel

auto sigmoid_kernel = [](const Vector& a, const Vector& b) {
    double alpha = 0.01;
    double coef0 = 0.0;
    return std::tanh(alpha * a.dot(b) + coef0);
};

Example usage

#include "SVM/Kernel Perceptron/kernel_perceptron.hpp"
#include <Eigen/Dense>
#include <vector>
#include <iostream>

using namespace mlpp::classifiers;
using Vector = Eigen::VectorXd;

int main() {
    // Create XOR-like dataset (not linearly separable)
    std::vector<Vector> X_train;
    std::vector<int> y_train;
    
    Vector v1(2); v1 << 0.0, 0.0;
    Vector v2(2); v2 << 0.0, 1.0;
    Vector v3(2); v3 << 1.0, 0.0;
    Vector v4(2); v4 << 1.0, 1.0;
    
    X_train = {v1, v2, v3, v4};
    y_train = {-1, 1, 1, -1};  // XOR pattern
    
    // Use polynomial kernel (degree 2) to handle non-linearity
    auto poly2_kernel = [](const Vector& a, const Vector& b) {
        return std::pow(a.dot(b) + 1.0, 2.0);
    };
    
    // Train kernel perceptron
    KernelPerceptron kp(X_train, y_train, poly2_kernel, 100);
    kp.fit();
    
    std::cout << "Training complete!" << std::endl;
    std::cout << "Total mistakes: " << kp.mistakes() << std::endl;
    
    // Test on training data
    std::cout << "\nPredictions on training data:" << std::endl;
    for (size_t i = 0; i < X_train.size(); ++i) {
        int pred = kp.predict(X_train[i]);
        std::cout << "Input: (" << X_train[i].transpose() << ") "
                  << "-> Predicted: " << pred 
                  << ", Actual: " << y_train[i]
                  << (pred == y_train[i] ? " ✓" : " ✗") << std::endl;
    }
    
    // Test on new points
    std::vector<Vector> X_test;
    Vector t1(2); t1 << 0.1, 0.9;
    Vector t2(2); t2 << 0.9, 0.1;
    Vector t3(2); t3 << 0.5, 0.5;
    X_test = {t1, t2, t3};
    
    std::cout << "\nPredictions on test data:" << std::endl;
    for (const auto& x : X_test) {
        int pred = kp.predict(x);
        std::cout << "Input: (" << x.transpose() << ") -> " << pred << std::endl;
    }
    
    return 0;
}

Notes

The kernel perceptron is particularly useful for non-linearly separable data
Unlike SVM, it does not maximize the margin, only finds a separating hyperplane
Training time depends on the number of mistakes made
Kernel caching is used internally for efficiency
For linearly separable data, the algorithm is guaranteed to converge
For non-separable data, training stops after max_epochs
The dual coefficients αᵢ represent how many times each training sample was misclassified

Regression

Classification

Decision Trees

Clustering

Dimensionality Reduction

Kernels

Model Validation

Loss Functions

Kernel Perceptron

Overview

Constructor

KernelPerceptron

Methods

fit

predict

mistakes

Common kernel functions

Linear kernel

Polynomial kernel

RBF (Gaussian) kernel

Sigmoid kernel

Example usage

Notes

Build docs developers (and LLMs) love

Regression

Classification

Decision Trees

Clustering

Dimensionality Reduction

Kernels

Model Validation

Loss Functions

​Overview

​Constructor

​KernelPerceptron

​Methods

​fit

​predict

​mistakes

​Common kernel functions

​Linear kernel

​Polynomial kernel

​RBF (Gaussian) kernel

​Sigmoid kernel

​Example usage

​Notes

Build docs developers (and LLMs) love

Overview

Constructor

KernelPerceptron

Methods

fit

predict

mistakes

Common kernel functions

Linear kernel

Polynomial kernel

RBF (Gaussian) kernel

Sigmoid kernel

Example usage

Notes