Mann-Whitney U Test in SQL

Phil Factor continues his Statistics in SQL series with the Mann-Whitney U test:

There are several ways that you can test this, but nobody is going to argue with you if you use a Mann–Whitney U test to test whether two samples come from the same distribution. It doesn’t require that the data has any particular type of distribution. It just requires that each observation is done by a different member of the population so that all the observations from both groups are independent of each other. It is really just a test of differences in mean-rank between two populations’ pooled ranking. To test this difference It has to be possible to compare any of the observations with any of the others and say which of the two are greater. Your objective is to disprove the assumption that The distributions of both populations are equal. Calculating a measure of the difference is simple, and was designed to be done easily by hand before computers. The probability that the observed difference occurred by chance is easily calculated for large samples because U then approximates to the normal distribution, but it is complex for small samples. Here, we have a small sample and are just interested in whether the two-tailed test is signifcant at the five percent level so we dodge the bullet by using a significance lookup table for the critical value of U.

Read on for Phil’s implementation of the test.

Related Posts

Recently Added String Functions

Lori Brown covers a few string functions added to SQL Server in the past two versions: STRING_ESCAPE (https://docs.microsoft.com/en-us/sql/t-sql/functions/string-escape-transact-sql) This function is available starting with SQL 2016 and is currently only able to escape JSON characters. To me it’s not super useful just yet but hopefully they will add more types soon. 1 SELECT STRING_ESCAPE(‘SQLRX’‘s beginning was […]

Read More

Finding Overlapping Data Ranges

Louis Davidson shows how to find groups of data which overlap: This week, I had a problem where I needed to find and eliminate from the results of my query, data with overlapping ranges. I have written about this topic before, in my database design book book, in regards to building a trigger to avoid overlapping […]

Read More

Categories

July 2017
MTWTFSS
« Jun Aug »
 12
3456789
10111213141516
17181920212223
24252627282930
31