Bucketing Tables By Size

Kevin Feasel

2018-04-06

T-SQL

Bill Fellows has an interesting approach to bucketing tables into groups of similar size:

You need to do something to all of the tables in SQL Server. That something can be anything: reindex/reorg, export the data, perform some other maintenance—it really doesn’t matter. What does matter is that you’d like to get it done sooner rather than later. If time is no consideration, then you’d likely just do one table at a time until you’ve done them all. Sometimes, a maximum degree of parallelization of one is less than ideal. You’re paying for more than one processor core, you might as well use it. The devil in splitting a workload out can be ensuring the tasks are well balanced. When I’m staging data in SSIS, I often use a row count as an approximation for a time cost. It’s not perfect – a million row table 430 columns wide might actually take longer than the 250 million row key-value table.

Click through for the script.  For the R version, this Stack Overflow post shows how to do it with cumulative sums and the cut function.

Related Posts

Debugging a Pivot

Ed Elliott takes us through problems with the PIVOT statement: If you have a PIVOT query and it isn’t returning the data you expect, what can you do to troubleshoot it? The thing to do is to break it down into the constituent parts. First, lets take a look at a query and see what […]

Read More

Avoiding DONE Tokens in Loops

Emanuele Meazzo shows one reason why loops can be so much slower in T-SQL: Not everybody knows that SQL Server sends a DONE Token to the client each time that a SQL statement completes (so, everything except variable declarations); For the query above you can basically track it with extended events by tracking the “SQL Statement Completed” […]

Read More

Categories

April 2018
MTWTFSS
« Mar May »
 1
2345678
9101112131415
16171819202122
23242526272829
30