Clustering

KMeans

K-Means clustering algorithm. Partitions n samples into k clusters by minimizing the within-cluster sum of squared distances to cluster centroids. Algorithm: Lloyd’s algorithm (iterative refinement)

Initialize k centroids (random or k-means++)
Assign each point to nearest centroid
Update centroids as mean of assigned points
Repeat until convergence or max iterations

Time Complexity: O(n * k * i * d) where n=samples, k=clusters, i=iterations, d=features

Constructor

new KMeans(options?: {
  nClusters?: number;
  maxIter?: number;
  tol?: number;
  init?: "random" | "kmeans++";
  randomState?: number;
})

options.nClusters

number

default:"8"

Number of clusters to form.

options.maxIter

number

default:"300"

Maximum number of iterations of the k-means algorithm.

options.tol

number

default:"1e-4"

Tolerance for convergence. Algorithm stops when change in inertia is below this threshold.

options.init

string

default:"kmeans++"

Initialization method: ‘random’ or ‘kmeans++’. K-means++ gives better initialization.

options.randomState

number

Random seed for reproducibility.

Methods

fit

fit(X: Tensor, y?: Tensor): this

Fit K-Means clustering on training data.

Tensor

required

Training data of shape (n_samples, n_features)

Tensor

Ignored (exists for compatibility)

Returns: The fitted estimator

predict

predict(X: Tensor): Tensor

Predict cluster labels for samples.

Tensor

required

Samples of shape (n_samples, n_features)

Returns: Cluster labels of shape (n_samples,)

fitPredict

fitPredict(X: Tensor, y?: Tensor): Tensor

Fit and predict in one step. Returns: Cluster labels for training data

Properties

clusterCenters

Tensor

Coordinates of cluster centers of shape (n_clusters, n_features)

labels

Tensor

Labels of each point from training data

inertia

number

Sum of squared distances of samples to their closest cluster center

nIter

number

Number of iterations run

Example

import { KMeans } from 'deepbox/ml';
import { tensor } from 'deepbox/ndarray';

const X = tensor([[1, 2], [1.5, 1.8], [5, 8], [8, 8], [1, 0.6], [9, 11]]);
const kmeans = new KMeans({ nClusters: 2, randomState: 42 });
kmeans.fit(X);

const labels = kmeans.predict(X);
console.log('Cluster labels:', labels);
console.log('Centroids:', kmeans.clusterCenters);

DBSCAN

DBSCAN (Density-Based Spatial Clustering of Applications with Noise). Clusters points based on density. Points in high-density regions are grouped together, while points in low-density regions are marked as noise. Algorithm:

For each point, find all neighbors within eps distance
If a point has at least minSamples neighbors, it’s a core point
Core points and their neighbors form clusters
Points not reachable from any core point are noise (label = -1)

Advantages:

No need to specify number of clusters
Can find arbitrarily shaped clusters
Robust to outliers

Constructor

new DBSCAN(options?: {
  eps?: number;
  minSamples?: number;
  metric?: "euclidean" | "manhattan";
})

options.eps

number

default:"0.5"

Maximum distance between two samples for one to be considered in the neighborhood of the other.

options.minSamples

number

default:"5"

Number of samples in a neighborhood for a point to be considered a core point.

options.metric

string

default:"euclidean"

Distance metric: ‘euclidean’ or ‘manhattan’.

Methods

fit

fit(X: Tensor, y?: Tensor): this

Perform DBSCAN clustering on data X.

Tensor

required

Training data of shape (n_samples, n_features)

Returns: The fitted estimator

predict

predict(X: Tensor): Tensor

Throws: NotImplementedError — DBSCAN is a transductive clustering algorithm and does not support prediction on new data. Use fitPredict() instead.

fitPredict

fitPredict(X: Tensor, y?: Tensor): Tensor

Fit DBSCAN and return cluster labels.

Tensor

required

Training data of shape (n_samples, n_features)

Returns: Cluster labels of shape (n_samples,). Noise points are labeled -1.

Properties

labels

Tensor

Cluster labels assigned during fitting. Noise points are labeled -1.

nClusters

number

Number of clusters found (excluding noise).

coreIndices

number[]

Indices of core samples. Core samples are points with at least minSamples neighbors within eps.

Example

import { DBSCAN } from 'deepbox/ml';
import { tensor } from 'deepbox/ndarray';

const X = tensor([[1, 2], [2, 2], [2, 3], [8, 7], [8, 8], [25, 80]]);
const dbscan = new DBSCAN({ eps: 3, minSamples: 2 });
const labels = dbscan.fitPredict(X);
// labels: [0, 0, 0, 1, 1, -1]  (-1 = noise)

NDArray

DataFrame

Linear Algebra

Statistics

Machine Learning

Neural Networks

Optimization

Preprocessing

Metrics

Random

Plotting

Datasets

KMeans

Constructor

Methods

fit

predict

fitPredict

Properties

Example

DBSCAN

Constructor

Methods

fit

predict

fitPredict

Properties

Example

Build docs developers (and LLMs) love

NDArray

DataFrame

Linear Algebra

Statistics

Machine Learning

Neural Networks

Optimization

Preprocessing

Metrics

Random

Plotting

Datasets

​KMeans

​Constructor

​Methods

​fit

​predict

​fitPredict

​Properties

​Example

​DBSCAN

​Constructor

​Methods

​fit

​predict

​fitPredict

​Properties

​Example

Build docs developers (and LLMs) love

KMeans

Constructor

Methods

fit

predict

fitPredict

Properties

Example

DBSCAN

Constructor

Methods

fit

predict

fitPredict

Properties

Example