Chris Williams starts a series on slowly changing dimensions in a Delta Lake:
Anyone that has contributed towards a Data Warehouse or a dimensional model in Power BI will know the distinction made between the time-series metrics of a Fact Table and the categorised attributes of a Dimension Table. These dimensions are also affected by the passage of time and require revised descriptions periodically which is why they are known as Slowly Changing Dimensions (SCD). See The Data Warehouse Toolkit – Kimball & Ross for more information.
Here is where the Delta Lake comes in. Using its many features such as support for ACID transactions (Atomicity, Consistency, Isolation and Durability) and schema enforcement we can create the same durable SCD’s. This may have required a series of complicated SQL statements in the past to achieve this. I will now discuss a few of the most common SCD’s and show how they can be easily achieved using a few Databricks Notebooks, which are available from my GitHub repo so you can download and have a go:
Check out the repo, but be sure to read the whole post.
Comments closed