Randomization With NEWID()

Kevin Feasel

2018-06-18

T-SQL

Michael J. Swart tests whether ORDER BY NEWID() produces a biased result:

One of his articles, Visualizing Algorithms has some thoughts on shuffling at https://bost.ocks.org/mike/algorithms/#shuffling.

He says that sorting using a random comparator is a rotten way to shuffle things. Not only is it inefficient, but the resulting shuffle is really really biased. He goes on to visualize that bias (again, I really encourage you to go see his stuff).

Ordering by random reminded me of the common technique in SQL Server of ORDER BY newid(). So I wondered whether an obvious bias was present there. So I shuffled 100 items thousands of times and recreated the visualization of bias in a heat map (just like Mike did).

I’ve used this pattern to good effect, but definitely pay attention to Michael’s warning at the end.

Related Posts

Enabling Database-Level Change Tracking

Tim Weigel continues a series on change tracking: If you don’t provide a retention period, SQL Server’s default is 2 days. Auto-cleanup defaults to ON unless you tell it otherwise. Easy! The table level commands aren’t any more complicated. Before we get started, please note that change tracking requires a primary key on the table […]

Read More

Isolation Levels and Dynamic SQL

Max Vernon points out how transaction isolation levels work when combined with sp_executesql: Imagine you have a piece of code where you don’t care about the downsides to the “read uncommitted” isolation level, and do your due diligence by adding SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; at the start of your code. The code following that statement […]

Read More

Categories

June 2018
MTWTFSS
« May Jul »
 123
45678910
11121314151617
18192021222324
252627282930