Measuring Correlation In SQL

Phil Factor shows how to calculate Kendall’s Tau and Spearman’s Rho in SQL:

Kendall’s Tau rank correlation is a handy way of determining how correlated two variables are, and whether this is more than chance. If you just want a measure of the correlation then you don’t have to assume very much about the distribution of the variables. Kendall’s Tau is popular with calculating correlations with non-parametric data. Spearman’s Rho is possibly more popular for the purpose, but Kendall’s tau has a distribution with better statistical properties (the sample estimate is close to a population variance) so confidence levels are more reliable, but in general, Kendall’s tau and Spearman’s rank correlation coefficient are very similar. The obvious difference between them is that, for the standard method of calculation,  Spearman’s Rank correlation required ranked data as input, whereas the algorithm to calculate Kendall’s Tau does this for you.  Kendall’s Tau consumes any non-parametric data with equal relish.

Kendall’s Tau is easy to calculate on paper, and makes intuitive sense. It deals with the probabilities of observing the agreeable (concordant) and non-agreeable (discordant) pairs of rankings. All observations are paired with each of the others, A concordant pair is one whose members of one observation are both larger than their respective members of the other paired observation, whereas discordant pairs have numbers that differ in opposite directions. Kendall’s Tau-b takes tied rankings into account.

I appreciate Phil putting this series together.  I’d probably stick with R, but it’s good to have options.

Related Posts

Neural Nets On Spark

Nisha Muktewar and Seth Hendrickson show how to use Deeplearning4j to build deep learning models on Hadoop and Spark: Modern convolutional networks can have several hundred million parameters. One of the top-performing neural networks in the Large Scale Visual Recognition Challenge (also known as “ImageNet”), has 140 million parameters to train! These networks not only […]

Read More

New T-SQL Functions

Dennes Torres shows off four T-SQL functions releasing in SQL Server 2017: Translate Translate does the work of several replace functions, simplifying some queries. The function is called ‘Translate’ because its main objective: transform one kind of information in another by doing a bunch of replaces. For example: GeoJson and WKT are two different formats […]

Read More


April 2017
« Mar May »