Press "Enter" to skip to content

Spring Cleaning for Lakehouse Tables with VACUUM

Chen Hirsh says it’s time to do a bit of cleanup:

Delta tables create new files for every change made to the table (insert, update, delete). You can use the old files to “time travel” – to query or restore older versions of your table. This is a wonderful feature, but over time, these files accumulate in your storage and will increase your storage costs.

Read on for a primer of the VACUUM command, how frequently you might want to run the command, and how much data you want to save. This example is specifically around using Databricks, but the mechanisms work exactly the same for other lakehouses like Microsoft Fabric.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.