K Nearest Cliques

Vincent Granville explains an algorithm built around finding cliques of data points:

The cliques considered here are defined by circles (in two dimensions) or spheres (in three dimensions.) In the most basic version, we have one clique for each cluster, and the clique is defined as the smallest circle containing a pre-specified proportion p of the points from the cluster in question. If the clusters are well separated, we can even use p = 1. We define the density of a clique as the number of points per unit area. In general, we want to build cliques with high density.

Ideally, we want each cluster in the training set to be covered by a small number of (possibly slightly overlapping) cliques, each one having a high density.  Also, as a general rule, a training set point can only belong to one clique, and (ideally) to only one cluster. But the circles associated with two cliques are allowed to overlap.

It’s an interesting approach, and I can see how it’d be faster than K Nearest Neighbors, but I do wonder how accurate the results would be in comparison to KNN.

Related Posts

Interpreting The Area Under The Receiver Operating Characteristic Curve

Roos Colman explains what a Receiver Operating Characteristic (ROC) curve is and how we interpret the Area Under the Curve (AUC): The AUC can be defined as “The probability that a randomly selected case will have a higher test result than a randomly selected control”. Let’s use this definition to calculate and visualize the estimated […]

Read More

Naive Bayes Against Large Data Sets

Catherine Bernadorne walks us through using Naive Bayes for sentiment analysis: The more data that is used to train the classifier, the more accurate it will become over time. So if we continue to train it with actual results in 2017, then what it predicts in 2018 will be more accurate. Also, when Bayes gives […]

Read More


August 2017
« Jul Sep »