Randomization With NEWID()

Kevin Feasel

2018-06-18

T-SQL

Michael J. Swart tests whether ORDER BY NEWID() produces a biased result:

One of his articles, Visualizing Algorithms has some thoughts on shuffling at https://bost.ocks.org/mike/algorithms/#shuffling.

He says that sorting using a random comparator is a rotten way to shuffle things. Not only is it inefficient, but the resulting shuffle is really really biased. He goes on to visualize that bias (again, I really encourage you to go see his stuff).

Ordering by random reminded me of the common technique in SQL Server of ORDER BY newid(). So I wondered whether an obvious bias was present there. So I shuffled 100 items thousands of times and recreated the visualization of bias in a heat map (just like Mike did).

I’ve used this pattern to good effect, but definitely pay attention to Michael’s warning at the end.

Related Posts

Converting Binary To Hex With T-SQL

Dave Mason uses STRING_SPLIT to convert binary values to their hex equivalents: I started pondering it for a bit and began to wonder if I could use the new for SQL Server 2016 STRING_SPLIT function to convert a binary string to decimal. The thought process was to split the string into rows of CHAR(1) values, along with an in-string character […]

Read More

Faster Scalar Functions In SQL Server 2019

Brent Ozar looks at improvements the SQL Server team has made to scalar functions in 2019: My database has to be in 2019 compat mode to enable Froid, the function-inlining magic. Run the same query again, and the metrics are wildly different: Runtime: 4 seconds CPU time: 4 seconds Logical reads: 3,247,991 (which still sounds bad, […]

Read More

Categories

June 2018
MTWTFSS
« May Jul »
 123
45678910
11121314151617
18192021222324
252627282930