Gavita Regunath has a two-parter on clustering. First, an explanation of the concept:
Clustering, or cluster analysis, is an unsupervised machine learning method. As the name implies, unsupervised machine learning refers to how the model ‘learns’ the data. It is a learning process opposite to supervised learning. With supervised learning, models are trained or “supervised” using labelled datasets (a known function output to our data). An example of a supervised learning method is where a model is trained to recognise animals based on their labels of being a cat, dog and rabbit.
Unsupervised learning works with unlabelled data where there are no known function outputs, and the aim is to identify patterns within a dataset. There are many unsupervised learning algorithms, however, the three main types are clustering algorithms, dimensionality reduction and anomaly detection. The focus of this blog will be on clustering, as it is the most commonly used unsupervised learning technique.
Second, a review of ten clustering algorithms:
There are many clustering algorithms. In fact, there are more than 100 clustering algorithms that have been published so far. However, despite the various types of clustering algorithms, they can generally be categorised into four methods. Let’s look at these briefly:
Read on to learn more about clustering.