KMeans
K-Means clustering algorithm. Partitions n samples into k clusters by minimizing the within-cluster sum of squared distances to cluster centroids. Algorithm: Lloyd’s algorithm (iterative refinement)- Initialize k centroids (random or k-means++)
- Assign each point to nearest centroid
- Update centroids as mean of assigned points
- Repeat until convergence or max iterations
Constructor
Number of clusters to form.
Maximum number of iterations of the k-means algorithm.
Tolerance for convergence. Algorithm stops when change in inertia is below this threshold.
Initialization method: ‘random’ or ‘kmeans++’. K-means++ gives better initialization.
Random seed for reproducibility.
Methods
fit
Training data of shape (n_samples, n_features)
Ignored (exists for compatibility)
predict
Samples of shape (n_samples, n_features)
fitPredict
Properties
Coordinates of cluster centers of shape (n_clusters, n_features)
Labels of each point from training data
Sum of squared distances of samples to their closest cluster center
Number of iterations run
Example
DBSCAN
DBSCAN (Density-Based Spatial Clustering of Applications with Noise). Clusters points based on density. Points in high-density regions are grouped together, while points in low-density regions are marked as noise. Algorithm:- For each point, find all neighbors within eps distance
- If a point has at least minSamples neighbors, it’s a core point
- Core points and their neighbors form clusters
- Points not reachable from any core point are noise (label = -1)
- No need to specify number of clusters
- Can find arbitrarily shaped clusters
- Robust to outliers
Constructor
Maximum distance between two samples for one to be considered in the neighborhood of the other.
Number of samples in a neighborhood for a point to be considered a core point.
Distance metric: ‘euclidean’ or ‘manhattan’.
Methods
fit
Training data of shape (n_samples, n_features)
predict
NotImplementedError — DBSCAN is a transductive clustering algorithm and does not support prediction on new data. Use fitPredict() instead.
fitPredict
Training data of shape (n_samples, n_features)
Properties
Cluster labels assigned during fitting. Noise points are labeled -1.
Number of clusters found (excluding noise).
Indices of core samples. Core samples are points with at least minSamples neighbors within eps.