Warehouse History

Kennie Pontoppidan delves into various aspects of collecting and storing history in warehouses:

In T2 history we have the two attributes ValidFromDate and ValidToDate. We can choose two different strategies for updating the values of these: using system time (load time) or business time. If we use system time for the T2 splits, the data warehouse history is dependent on when we load data. This makes it impossible to reload data in the data warehouse without messing up the data history. If we allow our load ETL procedures to use timestamps for business time (when data was really valid) for T2 history, we get the opportunity to reload data. But the cost of this flexibility is a much more complicated design for T2 splits. We also need to keep track of this metadata on the source system attributes.

Part of a warehouse’s value is its ability to replay historical data, but you can only do that if you store the data correctly (and query it correctly!).

Related Posts

Microservices With Kafka Streams

Ben Stopford walks us through a microservices architecture built on top of Kafka: So we can use the Kafka Streams API to piece together complex business systems as a collection of asynchronously executing, event-driven services. The differentiator here is the API itself, which is far richer than, say, the Kafka Producer or Consumer. It makes […]

Read More

Is Azure SQL DW A Good Fit For You?

Melissa Coates has a nice choose-your-own-adventure story around Azure SQL Data Warehouse: Q4: How large is your database? It is difficult to pinpoint an exact number for the absolute minimum size recommended for Azure SQL DW. Many data professionals in the industry see the minimum “practical” data size for Azure SQL DW in the 1-4TB […]

Read More

Categories

June 2016
MTWTFSS
« May Jul »
 12345
6789101112
13141516171819
20212223242526
27282930