Randomization With NEWID()

Kevin Feasel



Michael J. Swart tests whether ORDER BY NEWID() produces a biased result:

One of his articles, Visualizing Algorithms has some thoughts on shuffling at https://bost.ocks.org/mike/algorithms/#shuffling.

He says that sorting using a random comparator is a rotten way to shuffle things. Not only is it inefficient, but the resulting shuffle is really really biased. He goes on to visualize that bias (again, I really encourage you to go see his stuff).

Ordering by random reminded me of the common technique in SQL Server of ORDER BY newid(). So I wondered whether an obvious bias was present there. So I shuffled 100 items thousands of times and recreated the visualization of bias in a heat map (just like Mike did).

I’ve used this pattern to good effect, but definitely pay attention to Michael’s warning at the end.

Related Posts

Trailing Spaces and String Comparisons

Bert Wagner shows how SQL Server handles trailing spaces when comparing two strings: The LEN() function shows the number of characters in our string, while the DATALENGTH() function shows us the number of bytes used by that string. In this case, DATALENGTH is equal to 10. This result is due to the padded spaces occurring […]

Read More

T-SQL Bugs with Joins

Itzik Ben-Gan takes us through four bugs or oddities around joins: The order counts are now correct, but the total freight values are not. Can you spot the new bug? The new bug is more elusive because it manifests itself only when the same customer has at least one case where multiple orders happen to […]

Read More


June 2018
« May Jul »