ASK - ANSWER - COMMENT - VOTE - CREATE
Join the MathsGee Club | Ask your Question | StartUps
The KMeans algorithm clusters data by trying to separate samples in n groups of equal variance, minimizing a criterion known as the inertia or within-cluster sum-of-squares. This algorithm requires the number of clusters to be specified. It scales well to large number of samples and has been used across a large range of application areas in many different fields.
Inertia can be recognized as a measure of how internally coherent clusters are. It suffers from various drawbacks:
Inertia makes the assumption that clusters are convex and isotropic, which is not always the case. It responds poorly to elongated clusters, or manifolds with irregular shapes.
Inertia is not a normalized metric: we just know that lower values are better and zero is optimal. But in very high-dimensional spaces, Euclidean distances tend to become inflated (this is an instance of the so-called “curse of dimensionality”). Running a dimensionality reduction algorithm such as Principal component analysis (PCA) prior to k-means clustering can alleviate this problem and speed up the computations.
Login using Facebook
Login using LinkedIn
Login using Twitter
Join the MathsGee Answer Hub community and get study support for success - MathsGee Answer Hub provides answers to subject-specific educational questions for improved outcomes.
On MathsGee Answers, you can:
Posting on MathsGee