Press "Enter" to skip to content

Creating Evenly-Sized Batches from Groups

Daniel Hutmacher has a variant on the islands problem as well as the bin-packing problem:

My aim with this post is to split the dataset into batches of roughly 100 rows each.

DECLARE @target_rowcount bigint=100;

I say “roughly”, because we’re not allowed to split a transaction so that a group (grouping_column_1, grouping_column_2) appears in more than one batch, although a batch can obviously contain more than one group. This means that by necessity, some of the batches are going to be slightly under 100 rows and some are going to be slightly over.

Read on for a good solution to the problem. Daniel mentions places where performance could be better, though this feels like the kind of task you don’t necessarily run all that frequently.