Overview
Cost functions measure the quality of a clustering solution by quantifying how well the data points fit within their assigned clusters. Lower cost values indicate better clustering quality. The library provides cost calculation functions for both crisp and fuzzy clustering.getCostValues
Calculates the cost values for each centroid based on the crisp membership and distance matrices.Parameters
Crisp membership matrix where
membershipMatrix[i][j] is 1 if point j belongs to cluster i, otherwise 0.Matrix of distances where
distanceMatrix[i][j] is the distance between centroid i and point j.Returns
An array of cost values for each centroid, where each value represents the sum of distances from the centroid to all points assigned to it. Returns an empty array if either input matrix is empty.
Example
For crisp clustering, the cost value for each cluster is simply the sum of distances from the centroid to all points assigned to that cluster.
getFuzzyCostValues
Calculates the cost values for each centroid based on the fuzzy membership and distance matrices.Parameters
Matrix of distances where
distanceMatrix[i][j] is the distance between centroid i and point j.Fuzzy membership matrix where
membershipMatrix[i][j] is the degree of membership (0 to 1) of point j in cluster i.Fuzzification parameter (typically denoted as
m). Controls the degree of fuzziness. Common value is 2.Returns
An array of cost values for each centroid, calculated using weighted squared distances. Returns an empty array if either input matrix is empty.
Example
For fuzzy clustering, each point contributes to the cost of all clusters, weighted by its membership degree raised to the fuzzy parameter.
Mathematical Formula
The fuzzy cost value for clusteri is calculated as:
Jᵢis the cost value for clusteriuᵢⱼis the membership of pointjto clusterimis the fuzzification parameterdᵢⱼis the distance from pointjto centroidi- The sum is over all points
j
getCostFunction
Calculates the total cost from an array of individual cost values.Parameters
Array of individual cost values (one per cluster), typically obtained from
getCostValues() or getFuzzyCostValues().Returns
The total cost as the sum of all individual cost values. Returns
0 if the input array is empty.Example
Usage in Iteration
The total cost function is typically used to track convergence during the iterative clustering process:The cost function decreases with each iteration as the algorithm converges to an optimal clustering solution. Monitoring the cost function helps determine when to stop iterating.
Crisp vs Fuzzy Cost Calculation
Crisp Cost
- Binary weighting: Only considers points that belong to the cluster (membership = 1)
- Simple sum: Adds up distances for all assigned points
- Formula:
J = Σⱼ uᵢⱼ * dᵢⱼwhereuᵢⱼ ∈ {0, 1}
Fuzzy Cost
- Probabilistic weighting: Considers all points with their membership degrees
- Weighted sum: Uses membership values raised to the fuzzy parameter
- Formula:
J = Σⱼ (uᵢⱼ)^m * (dᵢⱼ)²