Skip to main content
Clustering groups similar data points together without labeled training data. This guide covers K-Means clustering, one of the most popular algorithms.

K-Means Clustering

1
Create and train a K-Means model
2
Group data points into clusters:
3
import { tensor } from "deepbox/ndarray";
import { KMeans } from "deepbox/ml";

// Create data with natural clusters
const clusterData = tensor([
  [1, 2],
  [1.5, 1.8],
  [5, 8],
  [8, 8],
  [1, 0.6],
  [9, 11],
  [8, 2],
  [10, 2],
  [9, 3],
]);

// Create and train K-Means
const kmeans = new KMeans({ 
  nClusters: 3,      // Number of clusters to find
  randomState: 42    // For reproducibility
});
kmeans.fit(clusterData);

console.log("K-Means Clustering");
console.log(`Number of iterations: ${kmeans.nIter}`);
console.log(`Inertia: ${kmeans.inertia.toFixed(4)}`);
4
Output:
5
K-Means Clustering
Number of iterations: 3
Inertia: 8.2500
6
Predict cluster assignments
7
Assign data points to their nearest cluster:
8
const clusterLabels = kmeans.predict(clusterData);
console.log("\nCluster assignments:");
console.log(clusterLabels.toString());

console.log("\nCluster centers:");
console.log(kmeans.clusterCenters.toString());
9
Output:
10
Cluster assignments:
Tensor([0, 0, 1, 1, 0, 2, 2, 2, 2])

Cluster centers:
Tensor([[1.1667, 1.4667],
        [6.5000, 8.0000],
        [9.0000, 4.5000]])
11
Cluster new data points
12
Predict clusters for unseen data:
13
const newData = tensor([
  [2, 1.5],
  [8, 9],
  [9, 2.5],
]);

const newLabels = kmeans.predict(newData);
console.log("\nPredictions for new data:");
console.log(newLabels.toString());
14
Output:
15
Predictions for new data:
Tensor([0, 1, 2])
16
Choosing the number of clusters
17
Use the elbow method to find optimal cluster count:
18
console.log("\nElbow method - testing different cluster counts:");

for (let k = 2; k <= 5; k++) {
  const km = new KMeans({ nClusters: k, randomState: 42 });
  km.fit(clusterData);
  console.log(`k=${k}: inertia=${km.inertia.toFixed(4)}`);
}
19
Output:
20
Elbow method - testing different cluster counts:
k=2: inertia=32.5625
k=3: inertia=8.2500
k=4: inertia=4.1250
k=5: inertia=2.0625

Understanding K-Means

  • Inertia: Sum of squared distances to nearest cluster center (lower is better)
  • Cluster centers: Mean position of all points in each cluster
  • Convergence: Algorithm stops when cluster assignments no longer change
  • Elbow method: Look for “elbow” in inertia plot to choose optimal k

Use Cases

  • Customer segmentation
  • Image compression
  • Anomaly detection
  • Document clustering
  • Feature engineering

Next Steps

PCA

Reduce dimensionality before clustering

Gaussian Naive Bayes

Probabilistic classification

Build docs developers (and LLMs) love