Warehouse History

Kennie Pontoppidan delves into various aspects of collecting and storing history in warehouses:

In T2 history we have the two attributes ValidFromDate and ValidToDate. We can choose two different strategies for updating the values of these: using system time (load time) or business time. If we use system time for the T2 splits, the data warehouse history is dependent on when we load data. This makes it impossible to reload data in the data warehouse without messing up the data history. If we allow our load ETL procedures to use timestamps for business time (when data was really valid) for T2 history, we get the opportunity to reload data. But the cost of this flexibility is a much more complicated design for T2 splits. We also need to keep track of this metadata on the source system attributes.

Part of a warehouse’s value is its ability to replay historical data, but you can only do that if you store the data correctly (and query it correctly!).

Related Posts

Virtualize Data Or Move It?

James Serra contrasts data virtualization with traditional ETL moving data to a warehouse: Data virtualization integrates data from disparate sources, locations and formats, without replicating or moving the data, to create a single “virtual” data layer that delivers unified data services to support multiple applications and users. Data movement is the process of extracting data from source […]

Read More

Avoid Scalar Functions In Computed Columns

Daniel Hutmacher shows why you should not include scalar functions inside computed column definitions: Scalar functions can be a real headache when you’re performance tuning. For one, they don’t parallelize. In fact, if you use a scalar function in a computed column, it will prevent any query that uses that table from going parallel – even if you […]

Read More

Categories

June 2016
MTWTFSS
« May Jul »
 12345
6789101112
13141516171819
20212223242526
27282930