Change Detection With Hashes

Kevin Feasel



Nigel Meakins shows how to use HashBytes to roll your own change detection:

So this all sounds very promising as a way of tracking changes to our Data Warehouse data, for purposes such as extracting deltas, inserts and updates to Type I and II dimensions and so forth. It doesn’t have any show-stopping overhead for the hashing operations for the sizes of data typically encountered and storage isn’t going to be an issue. It is native to T-SQL so we can rerun our hash value generation in the engine where our data resides rather than having to push through SSIS or some other tool to generate this for us. Algorithms are universal and as such will give us the same values wherever used for the same bytes of input. Let’s go back to the basic idea for a minute and consider how we implement this.

This is particularly useful in cases where you have metadata columns you don’t much care about (e.g., last modified time).  I do recommend using CONCAT or CONCAT_WS (if you’re on SQL Server 2017) to do string concatenation, though; it’d remove the need for util.CastAsNVarchar and possibly more.

Related Posts

Streaming ETL Using CDC And Event Hub

Rolf Tesmer combines Change Data Capture and Event Hubs to build a streaming ETL solution: The solution picks up the SQL data changes from the CDC Change Tracking system tables, creates JSON messages from the change rows, and then posts the message to an Azure Event Hub.  Once landed in the Event Hub an Azure […]

Read More

Real-Time Streaming ETL With Kafka Streams

Yeva Byzek has a tutorial using Kafka and Kafka Streams to perform real-time ETL: Let’s consider an application that does some real-time stateful stream processing with the Kafka Streams API. We’ll run through a specific example of the end-to-end reference architecture and show you how to: Run a Kafka source connector to read data from […]

Read More


June 2017
« May Jul »