Skip to main content

KMeans

K-Means clustering algorithm. Partitions n samples into k clusters by minimizing the within-cluster sum of squared distances to cluster centroids. Algorithm: Lloyd’s algorithm (iterative refinement)
  1. Initialize k centroids (random or k-means++)
  2. Assign each point to nearest centroid
  3. Update centroids as mean of assigned points
  4. Repeat until convergence or max iterations
Time Complexity: O(n * k * i * d) where n=samples, k=clusters, i=iterations, d=features

Constructor

new KMeans(options?: {
  nClusters?: number;
  maxIter?: number;
  tol?: number;
  init?: "random" | "kmeans++";
  randomState?: number;
})
options.nClusters
number
default:"8"
Number of clusters to form.
options.maxIter
number
default:"300"
Maximum number of iterations of the k-means algorithm.
options.tol
number
default:"1e-4"
Tolerance for convergence. Algorithm stops when change in inertia is below this threshold.
options.init
string
default:"kmeans++"
Initialization method: ‘random’ or ‘kmeans++’. K-means++ gives better initialization.
options.randomState
number
Random seed for reproducibility.

Methods

fit

fit(X: Tensor, y?: Tensor): this
Fit K-Means clustering on training data.
X
Tensor
required
Training data of shape (n_samples, n_features)
y
Tensor
Ignored (exists for compatibility)
Returns: The fitted estimator

predict

predict(X: Tensor): Tensor
Predict cluster labels for samples.
X
Tensor
required
Samples of shape (n_samples, n_features)
Returns: Cluster labels of shape (n_samples,)

fitPredict

fitPredict(X: Tensor, y?: Tensor): Tensor
Fit and predict in one step. Returns: Cluster labels for training data

Properties

clusterCenters
Tensor
Coordinates of cluster centers of shape (n_clusters, n_features)
labels
Tensor
Labels of each point from training data
inertia
number
Sum of squared distances of samples to their closest cluster center
nIter
number
Number of iterations run

Example

import { KMeans } from 'deepbox/ml';
import { tensor } from 'deepbox/ndarray';

const X = tensor([[1, 2], [1.5, 1.8], [5, 8], [8, 8], [1, 0.6], [9, 11]]);
const kmeans = new KMeans({ nClusters: 2, randomState: 42 });
kmeans.fit(X);

const labels = kmeans.predict(X);
console.log('Cluster labels:', labels);
console.log('Centroids:', kmeans.clusterCenters);

DBSCAN

DBSCAN (Density-Based Spatial Clustering of Applications with Noise). Clusters points based on density. Points in high-density regions are grouped together, while points in low-density regions are marked as noise. Algorithm:
  1. For each point, find all neighbors within eps distance
  2. If a point has at least minSamples neighbors, it’s a core point
  3. Core points and their neighbors form clusters
  4. Points not reachable from any core point are noise (label = -1)
Advantages:
  • No need to specify number of clusters
  • Can find arbitrarily shaped clusters
  • Robust to outliers

Constructor

new DBSCAN(options?: {
  eps?: number;
  minSamples?: number;
  metric?: "euclidean" | "manhattan";
})
options.eps
number
default:"0.5"
Maximum distance between two samples for one to be considered in the neighborhood of the other.
options.minSamples
number
default:"5"
Number of samples in a neighborhood for a point to be considered a core point.
options.metric
string
default:"euclidean"
Distance metric: ‘euclidean’ or ‘manhattan’.

Methods

fit

fit(X: Tensor, y?: Tensor): this
Perform DBSCAN clustering on data X.
X
Tensor
required
Training data of shape (n_samples, n_features)
Returns: The fitted estimator

predict

predict(X: Tensor): Tensor
Throws: NotImplementedError — DBSCAN is a transductive clustering algorithm and does not support prediction on new data. Use fitPredict() instead.

fitPredict

fitPredict(X: Tensor, y?: Tensor): Tensor
Fit DBSCAN and return cluster labels.
X
Tensor
required
Training data of shape (n_samples, n_features)
Returns: Cluster labels of shape (n_samples,). Noise points are labeled -1.

Properties

labels
Tensor
Cluster labels assigned during fitting. Noise points are labeled -1.
nClusters
number
Number of clusters found (excluding noise).
coreIndices
number[]
Indices of core samples. Core samples are points with at least minSamples neighbors within eps.

Example

import { DBSCAN } from 'deepbox/ml';
import { tensor } from 'deepbox/ndarray';

const X = tensor([[1, 2], [2, 2], [2, 3], [8, 7], [8, 8], [25, 80]]);
const dbscan = new DBSCAN({ eps: 3, minSamples: 2 });
const labels = dbscan.fitPredict(X);
// labels: [0, 0, 0, 1, 1, -1]  (-1 = noise)

Build docs developers (and LLMs) love