Bucketing Tables By Size

Kevin Feasel

2018-04-06

T-SQL

Bill Fellows has an interesting approach to bucketing tables into groups of similar size:

You need to do something to all of the tables in SQL Server. That something can be anything: reindex/reorg, export the data, perform some other maintenance—it really doesn’t matter. What does matter is that you’d like to get it done sooner rather than later. If time is no consideration, then you’d likely just do one table at a time until you’ve done them all. Sometimes, a maximum degree of parallelization of one is less than ideal. You’re paying for more than one processor core, you might as well use it. The devil in splitting a workload out can be ensuring the tasks are well balanced. When I’m staging data in SSIS, I often use a row count as an approximation for a time cost. It’s not perfect – a million row table 430 columns wide might actually take longer than the 250 million row key-value table.

Click through for the script.  For the R version, this Stack Overflow post shows how to do it with cumulative sums and the cut function.

Related Posts

Enabling Database-Level Change Tracking

Tim Weigel continues a series on change tracking: If you don’t provide a retention period, SQL Server’s default is 2 days. Auto-cleanup defaults to ON unless you tell it otherwise. Easy! The table level commands aren’t any more complicated. Before we get started, please note that change tracking requires a primary key on the table […]

Read More

Isolation Levels and Dynamic SQL

Max Vernon points out how transaction isolation levels work when combined with sp_executesql: Imagine you have a piece of code where you don’t care about the downsides to the “read uncommitted” isolation level, and do your due diligence by adding SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; at the start of your code. The code following that statement […]

Read More

Categories

April 2018
MTWTFSS
« Mar May »
 1
2345678
9101112131415
16171819202122
23242526272829
30