Mann-Whitney U Test in SQL

Phil Factor continues his Statistics in SQL series with the Mann-Whitney U test:

There are several ways that you can test this, but nobody is going to argue with you if you use a Mann–Whitney U test to test whether two samples come from the same distribution. It doesn’t require that the data has any particular type of distribution. It just requires that each observation is done by a different member of the population so that all the observations from both groups are independent of each other. It is really just a test of differences in mean-rank between two populations’ pooled ranking. To test this difference It has to be possible to compare any of the observations with any of the others and say which of the two are greater. Your objective is to disprove the assumption that The distributions of both populations are equal. Calculating a measure of the difference is simple, and was designed to be done easily by hand before computers. The probability that the observed difference occurred by chance is easily calculated for large samples because U then approximates to the normal distribution, but it is complex for small samples. Here, we have a small sample and are just interested in whether the two-tailed test is signifcant at the five percent level so we dodge the bullet by using a significance lookup table for the critical value of U.

Read on for Phil’s implementation of the test.

Related Posts

Random Forests In R

Anish Sing Walia explains the basics of random forests and provides sample code in R: Random Forests are similar to a famous Ensemble technique called Bagging but have a different tweak in it. In Random Forests the idea is to decorrelate the several trees which are generated on the different bootstrapped samples from training Data.And […]

Read More

Reading Drive Files Recursively From T-SQL

Slava Murygin shows how to use a common table expression to read a recursive drive listing: In this blog I’ll use undocumented in SQL Server extended stored procedure “xp_dirtree“. That procedure is well described in Patrick Keisler’s blog. Briefly, “xp_dirtree” extended procedure returns tree of sub-directories from given folder and has three parameters: 1. Starting or root folder […]

Read More


July 2017
« Jun