K Nearest Cliques

Vincent Granville explains an algorithm built around finding cliques of data points:

The cliques considered here are defined by circles (in two dimensions) or spheres (in three dimensions.) In the most basic version, we have one clique for each cluster, and the clique is defined as the smallest circle containing a pre-specified proportion p of the points from the cluster in question. If the clusters are well separated, we can even use p = 1. We define the density of a clique as the number of points per unit area. In general, we want to build cliques with high density.

Ideally, we want each cluster in the training set to be covered by a small number of (possibly slightly overlapping) cliques, each one having a high density.  Also, as a general rule, a training set point can only belong to one clique, and (ideally) to only one cluster. But the circles associated with two cliques are allowed to overlap.

It’s an interesting approach, and I can see how it’d be faster than K Nearest Neighbors, but I do wonder how accurate the results would be in comparison to KNN.

Related Posts

Classifying Texts With Naive Bayes

I continue my series on Naive Bayes with another hand-calculation post: Step two is, on the surface, pretty tough: how do we figure out if a set of words is a business phrase or a baseball phrase? We could try to think up a set of features. For example, how long is the phrase? How many unique […]

Read More

Where Machine Learning And Econometrics Collide

Dave Giles shares some thoughts on how machine learning and econometrics relate: What is Machine Learning (ML), and how does it differ from Statistics (and hence, implicitly, from Econometrics)? Those are big questions, but I think that they’re ones that econometricians should be thinking about. And if I were starting out in Econometrics today, I’d […]

Read More


August 2017
« Jul Sep »