Proving ETL Correctness

Ed Elliott shares a few techniques for testing ETL processes:

Reconciliation is the process of going to your source system, getting a number and validating that number on the target. This ranges from being easy to impossible, so you need to decide what to reconcile on a case by case basis.

In its simplest form, we can go to a source system and find out things like how many records are to be copied, sum up totals and run other aggregations that we can then validate as correct (or not!) on the target system.

Ed has put together a thoughtful approach to validating data loads regardless of the source.

Related Posts

SSIS Design Preferences

Meagan Longoria systematizes a set of preferences regarding Integration Services package and ETL process design: – Every table should have InsertDateTime and UpdateDateTime columns. The UpdateDateTime column should be populated with the same value as the InsertDateTime column upon creation of the row, rather than being left null.– Whatever you use to create tables, include […]

Read More

Dimensional Load with Databricks

Leo Furlong shows how we can load an Azure SQL Data Warehouse dimension with Databricks: Ingesting data into the Data Lake occurs in steps 1 and 2 in our architecture.  Azure Data Factory (ADF) provides an excellent mechanism for loading data from source applications into a Data Lake stored in Azure Data Lake Store Gen2.  […]

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Categories

August 2019
MTWTFSS
« Jul  
 1234
567891011
12131415161718
19202122232425
262728293031