Table Sampling

Kevin Feasel

2015-12-17

T-SQL

Ginger Grant shows a couple of techniques for sampling from tables:

The random sample that TABLESAMPLE provides is based on the number of data pages, not the number of records. If you want the number of rows to be specifically limited you will need to use Top(n) as well. I’ve written all my samples based upon AdventureWorksDW so you can run them for yourself later. I’ve listed the variety of ways to call TABLESAMPLE and shown the number of records returned.

TABLESAMPLE is useful for spelunking, but is somewhat limited otherwise.

Related Posts

Multiple SYSDATETIME In The Same SELECT May Give Unexpected Results

Louis Davidson walks through a scenario he experienced: The data is exactly as expected, even though the other two calls would have returned .902 and .903 if simply rounded off. On the other hand, looking for differences between the time1_3 and time2_3 columns: SELECT * FROM #hold WHERE DATEDIFF(MILLISECOND,time1_3,time2_3) <> –1; Returns 133 rows. With […]

Read More

Randomization With NEWID()

Michael J. Swart tests whether ORDER BY NEWID() produces a biased result: One of his articles, Visualizing Algorithms has some thoughts on shuffling at https://bost.ocks.org/mike/algorithms/#shuffling. He says that sorting using a random comparator is a rotten way to shuffle things. Not only is it inefficient, but the resulting shuffle is really really biased. He goes on to visualize that […]

Read More

Categories

December 2015
MTWTFSS
« Nov Jan »
 123456
78910111213
14151617181920
21222324252627
28293031