Hierarchical Clustering

Chaitanya Sagar explains hierarchical clustering with examples in R:

Hope now you have a better understanding of clustering algorithms than what you started with. We discussed about Divisive and Agglomerative clustering techniques and four linkage methods namely, Single, Complete, Average and Ward’s method. Next, we implemented the discussed techniques in R using a numeric dataset. Note that we didn’t have any categorical variable in the dataset we used. You need to treat the categorical variables in order to incorporate them into a clustering algorithm. Lastly, we discussed a couple of plots to visualise the clusters/groups formed. Note here that we have assumed value of ‘k’ (number of clusters) is known. However, this is not always the case. There are a number of heuristics and rules-of-thumb for picking number of clusters. A given heuristic will work better on some datasets than others. It’s best to take advantage of domain knowledge to help set the number of clusters, if that’s possible. Otherwise, try a variety of heuristics, and perhaps a few different values of k.

There’s a lot to pick out of this post, but you’re able to walk through it step by step.  H/T R-Bloggers

Related Posts

Dependencies as Risks

John Mount makes the point that packages dependencies are innately a risk: If your software or research depends on many complex and changing packages, you have no way to establish your work is correct. This is because to establish the correctness of your work, you would need to also establish the correctness of all of […]

Read More

Custom ggplot2 Fonts

Daniel Oehm shares two techniques for using custom fonts in your ggplot2 visuals: ggplot – You can spot one from a mile away, which is great! And when you do it’s a silent fist bump. But sometimes you want more than the standard theme. Fonts can breathe new life into your plots, helping to match […]

Read More


December 2017
« Nov Jan »