Incremental Loading Using Datetime Columns

Reza Rad shows a pattern for implementing incremental loads using modified dates:

The idea behind this method is to store the latest ETL run time in a config or log table, and then in the next ETL run just load records from the source table that have modified (with their modified date greater than or equal to) after the latest ETL run datetime. This will create the change set for the data table. The change set might contains inserted, updated, or deleted records. to identify which change happened on the record you need to compare the change set with existing records and separate inserted, updated, and deleted records. This change set with the action on each record can be inserted into staging tables, and then be used to apply on the fact table based on appropriate action.

In my experience, the hardest part about this is making sure people update ModifiedTime when they update rows in the table.

Related Posts

When Data Factory Flows Don’t

Emma Stewart points out an issue that might vex newcomers to Azure Data Factory: The data within the Data Lake store was organised into a Year and Month hierarchy for the folders, and each days transactions were stored in a file which was named after the day within the relevant month folder. The task then […]

Read More

Generating Task Factory Dynamics CRM Loads With Biml

Meagan Longoria shows how to use Biml to generate SSIS packages which use the Task Factory Dynamics CRM source: I recently worked on a project where a client wanted to use Biml to create SSIS packages to stage data from Dynamics 365 CRM. My first attempt using a script component had an error, which I […]

Read More

Categories

January 2016
MTWTFSS
« Dec Feb »
 123
45678910
11121314151617
18192021222324
25262728293031