K-Means Clustering
import { tensor } from "deepbox/ndarray";
import { KMeans } from "deepbox/ml";
// Create data with natural clusters
const clusterData = tensor([
[1, 2],
[1.5, 1.8],
[5, 8],
[8, 8],
[1, 0.6],
[9, 11],
[8, 2],
[10, 2],
[9, 3],
]);
// Create and train K-Means
const kmeans = new KMeans({
nClusters: 3, // Number of clusters to find
randomState: 42 // For reproducibility
});
kmeans.fit(clusterData);
console.log("K-Means Clustering");
console.log(`Number of iterations: ${kmeans.nIter}`);
console.log(`Inertia: ${kmeans.inertia.toFixed(4)}`);
const clusterLabels = kmeans.predict(clusterData);
console.log("\nCluster assignments:");
console.log(clusterLabels.toString());
console.log("\nCluster centers:");
console.log(kmeans.clusterCenters.toString());
Cluster assignments:
Tensor([0, 0, 1, 1, 0, 2, 2, 2, 2])
Cluster centers:
Tensor([[1.1667, 1.4667],
[6.5000, 8.0000],
[9.0000, 4.5000]])
const newData = tensor([
[2, 1.5],
[8, 9],
[9, 2.5],
]);
const newLabels = kmeans.predict(newData);
console.log("\nPredictions for new data:");
console.log(newLabels.toString());
console.log("\nElbow method - testing different cluster counts:");
for (let k = 2; k <= 5; k++) {
const km = new KMeans({ nClusters: k, randomState: 42 });
km.fit(clusterData);
console.log(`k=${k}: inertia=${km.inertia.toFixed(4)}`);
}
Understanding K-Means
- Inertia: Sum of squared distances to nearest cluster center (lower is better)
- Cluster centers: Mean position of all points in each cluster
- Convergence: Algorithm stops when cluster assignments no longer change
- Elbow method: Look for “elbow” in inertia plot to choose optimal k
Use Cases
- Customer segmentation
- Image compression
- Anomaly detection
- Document clustering
- Feature engineering
Next Steps
PCA
Reduce dimensionality before clustering
Gaussian Naive Bayes
Probabilistic classification