Bucketing Tables By Size

Kevin Feasel

2018-04-06

T-SQL

Bill Fellows has an interesting approach to bucketing tables into groups of similar size:

You need to do something to all of the tables in SQL Server. That something can be anything: reindex/reorg, export the data, perform some other maintenance—it really doesn’t matter. What does matter is that you’d like to get it done sooner rather than later. If time is no consideration, then you’d likely just do one table at a time until you’ve done them all. Sometimes, a maximum degree of parallelization of one is less than ideal. You’re paying for more than one processor core, you might as well use it. The devil in splitting a workload out can be ensuring the tasks are well balanced. When I’m staging data in SSIS, I often use a row count as an approximation for a time cost. It’s not perfect – a million row table 430 columns wide might actually take longer than the 250 million row key-value table.

Click through for the script.  For the R version, this Stack Overflow post shows how to do it with cumulative sums and the cut function.

Related Posts

Calculating Median In SQL Server 2019

Tomaz Kastrun shows that batch aggregation mode on window functions allow PERCENTILE_CONT finally to become useful: Next query, for median calculation was a window function query. SELECT DISTINCT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY c1) OVER (PARTITION BY (SELECT 1)) AS MedianCont FROM t1 To my surprise, the performance was even worse, and at this time, […]

Read More

Generating SQL With Biml

Cathrine Wilhelmsen shows us you can do a lot more with Biml than just generating SSIS packages: This actually happened to me in a previous job. We had a fairly complex ETL solution for the most critical part of our Data Warehouse. Many SSIS packages, views, and stored procedures queried the tables that were replicas […]

Read More

Categories

April 2018
MTWTFSS
« Mar May »
 1
2345678
9101112131415
16171819202122
23242526272829
30