Randomization With NEWID()

Kevin Feasel

2018-06-18

T-SQL

Michael J. Swart tests whether ORDER BY NEWID() produces a biased result:

One of his articles, Visualizing Algorithms has some thoughts on shuffling at https://bost.ocks.org/mike/algorithms/#shuffling.

He says that sorting using a random comparator is a rotten way to shuffle things. Not only is it inefficient, but the resulting shuffle is really really biased. He goes on to visualize that bias (again, I really encourage you to go see his stuff).

Ordering by random reminded me of the common technique in SQL Server of ORDER BY newid(). So I wondered whether an obvious bias was present there. So I shuffled 100 items thousands of times and recreated the visualization of bias in a heat map (just like Mike did).

I’ve used this pattern to good effect, but definitely pay attention to Michael’s warning at the end.

Related Posts

Dealing With Large JSON Values

Bert Wagner investigates an issue he found where his long JSON strings were becoming NULL in SQL Server: After a little bit more research, I discovered that the return type for JSON_VALUE is limited to 4000 characters.   Since JSON_VALUE is in lax mode by default, if the output has more than 4000 characters, it fails silently. […]

Read More

Deleting Top Records With An Order By Clause

Kenneth Fisher shows that deleting the top N records with an ORDER BY clause is not straightforward: Did you know you can’t do this? DELETE TOP (10) FROM SalesOrderDetail ORDER BY SalesOrderID DESC; Msg 156, Level 15, State 1, Line 8 Incorrect syntax near the keyword ‘ORDER’. I didn’t. Until I tried it anyway. Turns […]

Read More

Categories

June 2018
MTWTFSS
« May Jul »
 123
45678910
11121314151617
18192021222324
252627282930