Koos van Strien walks through some of the high-level concepts when automating data warehouse tests:
In my current project, I’ve got a database containing everything to perform these tests:
- Tables with identical structure to the ones in the staging area (plus two columns “TestSuiteName” and “TestName”)
- A table containing the mapping from test-input table to target database, schema and table
- A stored procedure to purge the DWH (all layers) in the test environment
- A stored procedure to insert the data for a specific testsuite / name
When preparing a specific test case (the “insert rows for test case” step from the diagram above), the rows needed for that case are copied into the DWH:
Testing warehouses is certainly not a trivial exercise but given how complex warehouse ETL tends to be, having good tests reduces the number of 3 AM pages.