Change Detection With Hashes

Kevin Feasel

2017-06-14

ETL

Nigel Meakins shows how to use HashBytes to roll your own change detection:

So this all sounds very promising as a way of tracking changes to our Data Warehouse data, for purposes such as extracting deltas, inserts and updates to Type I and II dimensions and so forth. It doesn’t have any show-stopping overhead for the hashing operations for the sizes of data typically encountered and storage isn’t going to be an issue. It is native to T-SQL so we can rerun our hash value generation in the engine where our data resides rather than having to push through SSIS or some other tool to generate this for us. Algorithms are universal and as such will give us the same values wherever used for the same bytes of input. Let’s go back to the basic idea for a minute and consider how we implement this.

This is particularly useful in cases where you have metadata columns you don’t much care about (e.g., last modified time).  I do recommend using CONCAT or CONCAT_WS (if you’re on SQL Server 2017) to do string concatenation, though; it’d remove the need for util.CastAsNVarchar and possibly more.

Related Posts

Contrasting Integration Services And Pentaho Data Integration

Koen Verbeeck contrasts SQL Server Integration Services with Pentaho Data Integration: For generating SSIS packages, you need to rely on Biml (much about that can be found on this blog or on the net), or older frameworks such as ezApi. Or you need 3rd party tools such as BimlStudio or TimeXtender. Using Biml means writing […]

Read More

Azure Data Factory Or Integration Services?

Teo Lachev contrasts use cases for Integration Services vesus Azure Data Factory V2: So, ADF was incorrectly positioned as “SSIS for the Cloud” and unfortunately once that message made it out there was a messaging problem that Microsoft has been fighting ever since. Like Azure ML, on the glory road to the cloud things that […]

Read More

Categories

June 2017
MTWTFSS
« May Jul »
 1234
567891011
12131415161718
19202122232425
2627282930