ASK - ANSWER - COMMENT - VOTE - CREATE

Join the MathsGee Club | Ask your Question | StartUps

The KMeans algorithm clusters data by trying to separate samples in n groups of equal variance, minimizing a criterion known as the inertia or within-cluster sum-of-squares. This algorithm requires the number of clusters to be specified. It scales well to large number of samples and has been used across a large range of application areas in many different fields.

KMeans

Inertia can be recognized as a measure of how internally coherent clusters are. It suffers from various drawbacks:

Inertia makes the assumption that clusters are convex and isotropic, which is not always the case. It responds poorly to elongated clusters, or manifolds with irregular shapes.

Inertia is not a normalized metric: we just know that lower values are better and zero is optimal. But in very high-dimensional spaces, Euclidean distances tend to become inflated (this is an instance of the so-called “curse of dimensionality”). Running a dimensionality reduction algorithm such as Principal component analysis (PCA) prior to k-means clustering can alleviate this problem and speed up the computations.

Login using Facebook Login using LinkedIn Login using Twitter

Join the MathsGee Answer Hub community and get study support for success - MathsGee Answer Hub provides answers to subject-specific educational questions for improved outcomes.

On MathsGee Answers, you can:

Posting on MathsGee