Incremental Loading Using Datetime Columns

Reza Rad shows a pattern for implementing incremental loads using modified dates:

The idea behind this method is to store the latest ETL run time in a config or log table, and then in the next ETL run just load records from the source table that have modified (with their modified date greater than or equal to) after the latest ETL run datetime. This will create the change set for the data table. The change set might contains inserted, updated, or deleted records. to identify which change happened on the record you need to compare the change set with existing records and separate inserted, updated, and deleted records. This change set with the action on each record can be inserted into staging tables, and then be used to apply on the fact table based on appropriate action.

In my experience, the hardest part about this is making sure people update ModifiedTime when they update rows in the table.

Related Posts

Loading Data Into SnowflakeDB

Dan Bilsborough shows a couple ways of loading data into SnowflakeDB from Azure: Before being loaded into a Snowflake table, the data can be optionally staged, which is essentially just a pointer to a location where the files are stored. There are different types of stages including:– User stages, which each user will have by […]

Read More

Converting Existing SSIS Packages to Biml

David Stein shows off a conversion tool built into BimlExpress: BimlExpress is a free Visual Studio add-in created by the good folks at Varigence. Its a full featured Biml editor which allows you to dynamically create SSIS packages. It was first released back in 2017, and the latest version is 2019 (of course). The current version […]

Read More

Categories

January 2016
MTWTFSS
« Dec Feb »
 123
45678910
11121314151617
18192021222324
25262728293031