Understanding K-Means Clustering

Chaitanya Sagar has a good explanation of the assumptions k-means clustering makes:

Why do we assume in the first place? The answer is that making assumptions helps simplify problems and simplified problems can then be solved accurately. To divide your dataset into clusters, one must define the criteria of a cluster and those make the assumptions for the technique. K-Means clustering method considers two assumptions regarding the clusters – first that the clusters are spherical and second that the clusters are of similar size. Spherical assumption helps in separating the clusters when the algorithm works on the data and forms clusters. If this assumption is violated, the clusters formed may not be what one expects. On the other hand, assumption over the size of clusters helps in deciding the boundaries of the cluster. This assumption helps in calculating the number of data points each cluster should have. This assumption also gives an advantage. Clusters in K-means are defined by taking the mean of all the data points in the cluster. With this assumption, one can start with the centers of clusters anywhere. Keeping the starting points of the clusters anywhere will still make the algorithm converge with the same final clusters as keeping the centers as far apart as possible.

Read on as Chaitanya shows several examples; the polar coordinate transformation was quite interesting.  H/T R-Bloggers

Related Posts

Reasons For Using Docker With R

Jeroen Ooms gives us a few reasons why we might want to containerize our R-based products: The flagship of the OpenCPU system is the OpenCPU server: a mature and powerful Linux stack for embedding R in systems and applications. Because OpenCPU is completely open source we can build and ship on DockerHub. A ready-to-go linux server […]

Read More

Unintentional Data

Eric Hollingsworth describes data science as the cost of collecting data approaches zero: Thankfully not only have modern data analysis tools made data collection cheap and easy, they have made the process of exploratory data analysis cheaper and easier as well. Yet when we use these tools to explore data and look for anomalies or […]

Read More

Categories

August 2017
MTWTFSS
« Jul Sep »
 123456
78910111213
14151617181920
21222324252627
28293031