Overview
The KernelPerceptron class implements the dual-form kernelized perceptron algorithm for binary classification. It uses kernel functions to implicitly map data to high-dimensional feature spaces without explicit computation.
Namespace: mlpp::classifiers
Decision function:
f(x) = sign(Σᵢ αᵢ yᵢ k(xᵢ, x))
where:
- αᵢ are the dual coefficients (mistake counters)
- yᵢ are the training labels
- k(·,·) is the kernel function
Constructor
KernelPerceptron
Construct a kernelized perceptron classifier.
KernelPerceptron(const std::vector<kernel::Vector>& X,
const std::vector<int>& y,
kernel::KernelFunction kernel,
std::size_t max_epochs = 100);
X
const std::vector<kernel::Vector>&
Training feature vectors. Each element is an Eigen::VectorXd
Training labels. Must contain values in
Kernel function with signature double(const Vector&, const Vector&)
Maximum number of passes through the training data
#include "SVM/Kernel Perceptron/kernel_perceptron.hpp"
#include <Eigen/Dense>
#include <vector>
using namespace mlpp::classifiers;
using Vector = Eigen::VectorXd;
// Prepare data
std::vector<Vector> X_train;
std::vector<int> y_train;
Vector x1(2); x1 << 1.0, 2.0;
Vector x2(2); x2 << 2.0, 3.0;
Vector x3(2); x3 << 5.0, 6.0;
Vector x4(2); x4 << 6.0, 7.0;
X_train = {x1, x2, x3, x4};
y_train = {-1, -1, 1, 1};
// Define polynomial kernel
auto poly_kernel = [](const Vector& a, const Vector& b) {
double degree = 2.0;
return std::pow(a.dot(b) + 1.0, degree);
};
// Create classifier
KernelPerceptron kp(X_train, y_train, poly_kernel, 50);
Methods
fit
Train the kernel perceptron model.
Runs the perceptron learning algorithm for up to max_epochs passes through the training data. Updates dual coefficients (alphas) when misclassifications occur.
The algorithm:
- For each training example (xᵢ, yᵢ)
- If yᵢ · f(xᵢ) ≤ 0 (misclassification), increment αᵢ
- Repeat until convergence or max_epochs reached
kp.fit();
std::cout << "Training complete. Mistakes: " << kp.mistakes() << std::endl;
predict
Predict the class label for a new sample.
[[nodiscard]]
int predict(const kernel::Vector& x) const;
Input feature vector (Eigen::VectorXd) to classify
Predicted class label: +1 or -1
Computes the decision function and returns its sign:
predict(x) = sign(Σᵢ αᵢ yᵢ k(xᵢ, x))
Vector test_point(2);
test_point << 3.5, 4.0;
int prediction = kp.predict(test_point);
std::cout << "Predicted class: " << prediction << std::endl; // +1 or -1
mistakes
Get the total number of mistakes made during training.
[[nodiscard]]
std::size_t mistakes() const noexcept;
Total count of misclassifications encountered during training
This count indicates how many times the perceptron made an incorrect prediction during the training process. Fewer mistakes generally indicate better convergence.
std::cout << "Training errors: " << kp.mistakes() << std::endl;
Common kernel functions
The perceptron works with any valid kernel function. Here are common choices:
Linear kernel
auto linear_kernel = [](const Vector& a, const Vector& b) {
return a.dot(b);
};
Polynomial kernel
auto polynomial_kernel = [](const Vector& a, const Vector& b) {
double degree = 3.0;
double coef0 = 1.0;
return std::pow(a.dot(b) + coef0, degree);
};
RBF (Gaussian) kernel
auto rbf_kernel = [](const Vector& a, const Vector& b) {
double gamma = 0.5;
return std::exp(-gamma * (a - b).squaredNorm());
};
Sigmoid kernel
auto sigmoid_kernel = [](const Vector& a, const Vector& b) {
double alpha = 0.01;
double coef0 = 0.0;
return std::tanh(alpha * a.dot(b) + coef0);
};
Example usage
#include "SVM/Kernel Perceptron/kernel_perceptron.hpp"
#include <Eigen/Dense>
#include <vector>
#include <iostream>
using namespace mlpp::classifiers;
using Vector = Eigen::VectorXd;
int main() {
// Create XOR-like dataset (not linearly separable)
std::vector<Vector> X_train;
std::vector<int> y_train;
Vector v1(2); v1 << 0.0, 0.0;
Vector v2(2); v2 << 0.0, 1.0;
Vector v3(2); v3 << 1.0, 0.0;
Vector v4(2); v4 << 1.0, 1.0;
X_train = {v1, v2, v3, v4};
y_train = {-1, 1, 1, -1}; // XOR pattern
// Use polynomial kernel (degree 2) to handle non-linearity
auto poly2_kernel = [](const Vector& a, const Vector& b) {
return std::pow(a.dot(b) + 1.0, 2.0);
};
// Train kernel perceptron
KernelPerceptron kp(X_train, y_train, poly2_kernel, 100);
kp.fit();
std::cout << "Training complete!" << std::endl;
std::cout << "Total mistakes: " << kp.mistakes() << std::endl;
// Test on training data
std::cout << "\nPredictions on training data:" << std::endl;
for (size_t i = 0; i < X_train.size(); ++i) {
int pred = kp.predict(X_train[i]);
std::cout << "Input: (" << X_train[i].transpose() << ") "
<< "-> Predicted: " << pred
<< ", Actual: " << y_train[i]
<< (pred == y_train[i] ? " ✓" : " ✗") << std::endl;
}
// Test on new points
std::vector<Vector> X_test;
Vector t1(2); t1 << 0.1, 0.9;
Vector t2(2); t2 << 0.9, 0.1;
Vector t3(2); t3 << 0.5, 0.5;
X_test = {t1, t2, t3};
std::cout << "\nPredictions on test data:" << std::endl;
for (const auto& x : X_test) {
int pred = kp.predict(x);
std::cout << "Input: (" << x.transpose() << ") -> " << pred << std::endl;
}
return 0;
}
Notes
- The kernel perceptron is particularly useful for non-linearly separable data
- Unlike SVM, it does not maximize the margin, only finds a separating hyperplane
- Training time depends on the number of mistakes made
- Kernel caching is used internally for efficiency
- For linearly separable data, the algorithm is guaranteed to converge
- For non-separable data, training stops after max_epochs
- The dual coefficients αᵢ represent how many times each training sample was misclassified