An Intro to k-Means Clustering

Holger von Jouanne-Diedrich takes us through an example of how k-means clustering works:

The guiding principles are:

– The distance between data points within clusters should be as small as possible.
– The distance of the centroids (= centres of the clusters) should be as big as possible.

Because there are too many possible combinations of all possible clusters comprising all possible data points k-means follows an iterative approach

Click through for a demonstration. I appreciate adding visualizations for intermediate steps in there as well because it gives you an intuitive understanding for what the one-liner function is really doing.

Related Posts

Calculating AUC in R

Andrew Treadway shows how you can calculate Area Under the Curve in R: AUC is an important metric in machine learning for classification. It is often used as a measure of a model’s performance. In effect, AUC is a measure between 0 and 1 of a model’s performance that rank-orders predictions from a model. For […]

Read More

Python versus R (Again)

Alex Woodie looks at whether Python is dominating R in the data science space: There is some evidence that Python’s popularity is hurting R usage. According to the TIOBE Index, Python is currently the third most popular language in the world, behind perennial heavyweights Java and C. From August 2018 to August 2019, Python usage surged […]

Read More


July 2019
« Jun Aug »