Bucketing Tables By Size

Kevin Feasel

2018-04-06

T-SQL

Bill Fellows has an interesting approach to bucketing tables into groups of similar size:

You need to do something to all of the tables in SQL Server. That something can be anything: reindex/reorg, export the data, perform some other maintenance—it really doesn’t matter. What does matter is that you’d like to get it done sooner rather than later. If time is no consideration, then you’d likely just do one table at a time until you’ve done them all. Sometimes, a maximum degree of parallelization of one is less than ideal. You’re paying for more than one processor core, you might as well use it. The devil in splitting a workload out can be ensuring the tasks are well balanced. When I’m staging data in SSIS, I often use a row count as an approximation for a time cost. It’s not perfect – a million row table 430 columns wide might actually take longer than the 250 million row key-value table.

Click through for the script.  For the R version, this Stack Overflow post shows how to do it with cumulative sums and the cut function.

Related Posts

Understanding ANY And ALL In SQL

Doug Kline explains the ANY and ALL operators in SQL: -- note that this creates a single column of values -- which could be used in something like IN -- for example SELECT 1 WHERE 12 IN ( SELECT tempField FROM (VALUES(11),(12),(7)) tempTable(tempField)) -- I could rephrase this as: SELECT 1 WHERE 12 = ANY […]

Read More

Validating SSIS Packages Using T-SQL

Annie Xu shows us how to validate SSIS packages in the SSISDB catalog using T-SQL: Recently, I need to do a data warehouse migration for a client. Since there might be some difference between the Dev environment source databases and Prod environment source databases. The migrated SSIS packages for building data warehouse might have some […]

Read More

Categories

April 2018
MTWTFSS
« Mar May »
 1
2345678
9101112131415
16171819202122
23242526272829
30