Skip to main content

Overview

The KernelPerceptron class implements the dual-form kernelized perceptron algorithm for binary classification. It uses kernel functions to implicitly map data to high-dimensional feature spaces without explicit computation. Namespace: mlpp::classifiers Decision function:
f(x) = sign(Σᵢ αᵢ yᵢ k(xᵢ, x))
where:
  • αᵢ are the dual coefficients (mistake counters)
  • yᵢ are the training labels
  • k(·,·) is the kernel function

Constructor

KernelPerceptron

Construct a kernelized perceptron classifier.
KernelPerceptron(const std::vector<kernel::Vector>& X,
                 const std::vector<int>& y,
                 kernel::KernelFunction kernel,
                 std::size_t max_epochs = 100);
X
const std::vector<kernel::Vector>&
Training feature vectors. Each element is an Eigen::VectorXd
y
const std::vector<int>&
Training labels. Must contain values in
kernel
kernel::KernelFunction
Kernel function with signature double(const Vector&, const Vector&)
max_epochs
std::size_t
default:"100"
Maximum number of passes through the training data
#include "SVM/Kernel Perceptron/kernel_perceptron.hpp"
#include <Eigen/Dense>
#include <vector>

using namespace mlpp::classifiers;
using Vector = Eigen::VectorXd;

// Prepare data
std::vector<Vector> X_train;
std::vector<int> y_train;

Vector x1(2); x1 << 1.0, 2.0;
Vector x2(2); x2 << 2.0, 3.0;
Vector x3(2); x3 << 5.0, 6.0;
Vector x4(2); x4 << 6.0, 7.0;

X_train = {x1, x2, x3, x4};
y_train = {-1, -1, 1, 1};

// Define polynomial kernel
auto poly_kernel = [](const Vector& a, const Vector& b) {
    double degree = 2.0;
    return std::pow(a.dot(b) + 1.0, degree);
};

// Create classifier
KernelPerceptron kp(X_train, y_train, poly_kernel, 50);

Methods

fit

Train the kernel perceptron model.
void fit();
Runs the perceptron learning algorithm for up to max_epochs passes through the training data. Updates dual coefficients (alphas) when misclassifications occur. The algorithm:
  1. For each training example (xᵢ, yᵢ)
  2. If yᵢ · f(xᵢ) ≤ 0 (misclassification), increment αᵢ
  3. Repeat until convergence or max_epochs reached
kp.fit();
std::cout << "Training complete. Mistakes: " << kp.mistakes() << std::endl;

predict

Predict the class label for a new sample.
[[nodiscard]]
int predict(const kernel::Vector& x) const;
x
const kernel::Vector&
Input feature vector (Eigen::VectorXd) to classify
return
int
Predicted class label: +1 or -1
Computes the decision function and returns its sign:
predict(x) = sign(Σᵢ αᵢ yᵢ k(xᵢ, x))
Vector test_point(2);
test_point << 3.5, 4.0;

int prediction = kp.predict(test_point);
std::cout << "Predicted class: " << prediction << std::endl;  // +1 or -1

mistakes

Get the total number of mistakes made during training.
[[nodiscard]]
std::size_t mistakes() const noexcept;
return
std::size_t
Total count of misclassifications encountered during training
This count indicates how many times the perceptron made an incorrect prediction during the training process. Fewer mistakes generally indicate better convergence.
std::cout << "Training errors: " << kp.mistakes() << std::endl;

Common kernel functions

The perceptron works with any valid kernel function. Here are common choices:

Linear kernel

auto linear_kernel = [](const Vector& a, const Vector& b) {
    return a.dot(b);
};

Polynomial kernel

auto polynomial_kernel = [](const Vector& a, const Vector& b) {
    double degree = 3.0;
    double coef0 = 1.0;
    return std::pow(a.dot(b) + coef0, degree);
};

RBF (Gaussian) kernel

auto rbf_kernel = [](const Vector& a, const Vector& b) {
    double gamma = 0.5;
    return std::exp(-gamma * (a - b).squaredNorm());
};

Sigmoid kernel

auto sigmoid_kernel = [](const Vector& a, const Vector& b) {
    double alpha = 0.01;
    double coef0 = 0.0;
    return std::tanh(alpha * a.dot(b) + coef0);
};

Example usage

#include "SVM/Kernel Perceptron/kernel_perceptron.hpp"
#include <Eigen/Dense>
#include <vector>
#include <iostream>

using namespace mlpp::classifiers;
using Vector = Eigen::VectorXd;

int main() {
    // Create XOR-like dataset (not linearly separable)
    std::vector<Vector> X_train;
    std::vector<int> y_train;
    
    Vector v1(2); v1 << 0.0, 0.0;
    Vector v2(2); v2 << 0.0, 1.0;
    Vector v3(2); v3 << 1.0, 0.0;
    Vector v4(2); v4 << 1.0, 1.0;
    
    X_train = {v1, v2, v3, v4};
    y_train = {-1, 1, 1, -1};  // XOR pattern
    
    // Use polynomial kernel (degree 2) to handle non-linearity
    auto poly2_kernel = [](const Vector& a, const Vector& b) {
        return std::pow(a.dot(b) + 1.0, 2.0);
    };
    
    // Train kernel perceptron
    KernelPerceptron kp(X_train, y_train, poly2_kernel, 100);
    kp.fit();
    
    std::cout << "Training complete!" << std::endl;
    std::cout << "Total mistakes: " << kp.mistakes() << std::endl;
    
    // Test on training data
    std::cout << "\nPredictions on training data:" << std::endl;
    for (size_t i = 0; i < X_train.size(); ++i) {
        int pred = kp.predict(X_train[i]);
        std::cout << "Input: (" << X_train[i].transpose() << ") "
                  << "-> Predicted: " << pred 
                  << ", Actual: " << y_train[i]
                  << (pred == y_train[i] ? " ✓" : " ✗") << std::endl;
    }
    
    // Test on new points
    std::vector<Vector> X_test;
    Vector t1(2); t1 << 0.1, 0.9;
    Vector t2(2); t2 << 0.9, 0.1;
    Vector t3(2); t3 << 0.5, 0.5;
    X_test = {t1, t2, t3};
    
    std::cout << "\nPredictions on test data:" << std::endl;
    for (const auto& x : X_test) {
        int pred = kp.predict(x);
        std::cout << "Input: (" << x.transpose() << ") -> " << pred << std::endl;
    }
    
    return 0;
}

Notes

  • The kernel perceptron is particularly useful for non-linearly separable data
  • Unlike SVM, it does not maximize the margin, only finds a separating hyperplane
  • Training time depends on the number of mistakes made
  • Kernel caching is used internally for efficiency
  • For linearly separable data, the algorithm is guaranteed to converge
  • For non-separable data, training stops after max_epochs
  • The dual coefficients αᵢ represent how many times each training sample was misclassified

Build docs developers (and LLMs) love