K Nearest Cliques

Vincent Granville explains an algorithm built around finding cliques of data points:

The cliques considered here are defined by circles (in two dimensions) or spheres (in three dimensions.) In the most basic version, we have one clique for each cluster, and the clique is defined as the smallest circle containing a pre-specified proportion p of the points from the cluster in question. If the clusters are well separated, we can even use p = 1. We define the density of a clique as the number of points per unit area. In general, we want to build cliques with high density.

Ideally, we want each cluster in the training set to be covered by a small number of (possibly slightly overlapping) cliques, each one having a high density.  Also, as a general rule, a training set point can only belong to one clique, and (ideally) to only one cluster. But the circles associated with two cliques are allowed to overlap.

It’s an interesting approach, and I can see how it’d be faster than K Nearest Neighbors, but I do wonder how accurate the results would be in comparison to KNN.

Related Posts

Markov Chains In Python

Sandipan Dey shows off various uses of Markov chains as well as how to create one in Python: Perspective. In the 1948 landmark paper A Mathematical Theory of Communication, Claude Shannon founded the field of information theory and revolutionized the telecommunications industry, laying the groundwork for today’s Information Age. In this paper, Shannon proposed using a Markov chain to […]

Read More

More DBA Salary Research

Ginger Grant digs into the DBA salary survey a bit further: I know that I have heard that if you want to make money you need to get into management. Being a good manager is not the same skill set as being a good database professional, and there are many people who do not want to […]

Read More


August 2017
« Jul Sep »