Press "Enter" to skip to content

Batching Large Data Operations via Key Ranges

Andy Brownsword updates or deletes a batch of rows:

Effective batching in general helps us by:

  • Reduce transaction length and minimise blocking
  • Avoids unnecessary checking of the same rows repeatedly
  • Introduce graceful pacing to reduce impact on busy environments or data replication

I’m not the biggest fan of the OFFSET/FETCH combination there, at least if your key column is fairly well packed—like, say, 99+% of the rows are contiguous and you occasionally have a jump of a few thousand rows. Also, that batch size of 100K might be a little high, although that will certainly depend on what the operation is. Batch updating a column based on some fairly straightforward calculation? You can probably get away with 100K, though I’d still prefer 10K. But as you add more complexities (deleting rows, very high server throughput, triggers, limited hardware, etc.), that number should edge downward.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.