Testing ETL Pipelines

Ed Elliott has started a new series on testing ETL pipelines:

We test in production, this means we have monitoring and do things like have phased roll-outs using feature flags, or we roll-out to select customers first, prove it then roll it out to everyone else. Testing in production doesn’t mean hacking around getting some process to work. We don’t test “on production” (hacking), we test “in production” – while we are in production we are continually testing, and if anything goes wrong, we have alerts and can deal with it.

Testing pipelines feels difficult because there are so many moving pieces, but if you design for testability (e.g., being able to tee off samples of data, send test records through, etc.), things get easier.

Related Posts

Proving ETL Correctness

Ed Elliott shares a few techniques for testing ETL processes: Reconciliation is the process of going to your source system, getting a number and validating that number on the target. This ranges from being easy to impossible, so you need to decide what to reconcile on a case by case basis. In its simplest form, […]

Read More

SSIS Design Preferences

Meagan Longoria systematizes a set of preferences regarding Integration Services package and ETL process design: – Every table should have InsertDateTime and UpdateDateTime columns. The UpdateDateTime column should be populated with the same value as the InsertDateTime column upon creation of the row, rather than being left null.– Whatever you use to create tables, include […]

Read More

Categories

July 2019
MTWTFSS
« Jun Aug »
1234567
891011121314
15161718192021
22232425262728
293031