Clustering

Clustering algorithms partition data into groups where similar items belong to the same cluster. MLPP provides template-based cluster classes and efficient spatial indexing with KD-Trees.

Cluster classes

MLPP provides a hierarchy of cluster classes in the mlpp::unsupervised::clustering namespace for representing grouped data points.

Cluster

Base cluster class for grouping data records.

#include "Unsupervised Learning/Clustering/cluster.hpp"

template<typename DatasetT>
class Cluster {
public:
    using dataset_type = DatasetT;
    using record_type = typename DatasetT::record_type;

    void set_id(std::size_t id) noexcept;
    std::size_t id() const noexcept;

protected:
    std::size_t id_ = 0;
    std::vector<std::shared_ptr<record_type>> elements_;
};

Template parameters:

DatasetT - Dataset type containing records to cluster

Methods:

set_id(id) - Assign cluster identifier
id() - Get cluster identifier

CenterCluster

Cluster with a centroid representing the cluster center.

template<typename DatasetT>
class CenterCluster : public Cluster<DatasetT> {
public:
    using record_type = typename DatasetT::record_type;

    explicit CenterCluster(const std::shared_ptr<record_type>& center);

    const std::shared_ptr<record_type>& center() const noexcept;
    std::shared_ptr<record_type>& center() noexcept;

protected:
    std::shared_ptr<record_type> center_;
};

Constructor:

CenterCluster(center) - Initialize with cluster centroid

Methods:

center() - Access cluster centroid

Use cases:

K-means clustering
K-medoids clustering
Any centroid-based partitioning

SubspaceCluster

Cluster with weighted subspace projection for high-dimensional data.

template<typename DatasetT>
class SubspaceCluster : public CenterCluster<DatasetT> {
public:
    using value_type = typename DatasetT::value_type;

    explicit SubspaceCluster(const std::shared_ptr<record_type>& center);

    std::vector<value_type>& w() noexcept;
    const std::vector<value_type>& w() const noexcept;

    value_type& w(std::size_t i) noexcept;
    const value_type& w(std::size_t i) const noexcept;

protected:
    std::vector<value_type> w_;
};

Constructor:

SubspaceCluster(center) - Initialize with cluster centroid

Methods:

w() - Access weight vector for subspace projection
w(i) - Access individual weight at dimension i

Use cases:

Subspace clustering in high dimensions
Feature-weighted clustering
Locally weighted projections

KD-Tree

KD-Tree provides efficient spatial indexing for nearest neighbor and radius searches in multi-dimensional space.

#include "Unsupervised Learning/Clustering/KD-Tree/kd_tree.h"

template<typename PointT, std::size_t Dim>
class kd_tree {
public:
    kd_tree(const std::vector<PointT>* pts);
    std::vector<std::size_t> radius_search(const PointT& q, double eps) const;
};

Template parameters:

PointT - Point type supporting operator[] for coordinate access
Dim - Number of dimensions (compile-time constant)

Constructor

kd_tree(const std::vector<PointT>* pts)

Builds KD-Tree from point collection. Parameters:

pts - Pointer to vector of points (must remain valid during tree lifetime)

Radius search

std::vector<std::size_t> radius_search(const PointT& q, double eps) const

Finds all points within radius eps of query point q. Parameters:

q - Query point
eps - Search radius (Euclidean distance)

Returns: Vector of indices into original point collection

Example usage

#include "Unsupervised Learning/Clustering/KD-Tree/kd_tree.h"
#include <vector>
#include <array>

// Define 3D points
using Point3D = std::array<double, 3>;
std::vector<Point3D> points = {
    {1.0, 2.0, 3.0},
    {4.0, 5.0, 6.0},
    {1.1, 2.1, 3.1},
    {10.0, 10.0, 10.0}
};

// Build KD-Tree
kd_tree<Point3D, 3> tree(&points);

// Find neighbors within radius 0.5
Point3D query = {1.0, 2.0, 3.0};
auto neighbors = tree.radius_search(query, 0.5);

// neighbors contains indices: {0, 2}

KD-Trees provide O(log n) average-case search performance, making them ideal for DBSCAN clustering, local density estimation, and nearest neighbor queries.

Performance considerations

KD-Tree construction: O(n log n) time complexity
Radius search: O(log n) average case, O(n) worst case
Template instantiation: Dimension must be compile-time constant
Memory overhead: Additional nodes stored internally

Use dimensionality reduction before clustering high-dimensional data
Combine KD-Tree with DBSCAN for density-based clustering
Apply PCA before K-means to reduce noise and improve convergence

Getting Started

Core Concepts

Supervised Learning

Unsupervised Learning

Model Validation

Advanced Topics

Cluster classes

Cluster

CenterCluster

SubspaceCluster

KD-Tree

Constructor

Radius search

Example usage

Performance considerations

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Supervised Learning

Unsupervised Learning

Model Validation

Advanced Topics

​Cluster classes

​Cluster

​CenterCluster

​SubspaceCluster

​KD-Tree

​Constructor

​Radius search

​Example usage

​Performance considerations

​Related topics

Build docs developers (and LLMs) love

Cluster classes

Cluster

CenterCluster

SubspaceCluster

KD-Tree

Constructor

Radius search

Example usage

Performance considerations

Related topics