Change Detection With Hashes

Kevin Feasel

2017-06-14

ETL

Nigel Meakins shows how to use HashBytes to roll your own change detection:

So this all sounds very promising as a way of tracking changes to our Data Warehouse data, for purposes such as extracting deltas, inserts and updates to Type I and II dimensions and so forth. It doesn’t have any show-stopping overhead for the hashing operations for the sizes of data typically encountered and storage isn’t going to be an issue. It is native to T-SQL so we can rerun our hash value generation in the engine where our data resides rather than having to push through SSIS or some other tool to generate this for us. Algorithms are universal and as such will give us the same values wherever used for the same bytes of input. Let’s go back to the basic idea for a minute and consider how we implement this.

This is particularly useful in cases where you have metadata columns you don’t much care about (e.g., last modified time).  I do recommend using CONCAT or CONCAT_WS (if you’re on SQL Server 2017) to do string concatenation, though; it’d remove the need for util.CastAsNVarchar and possibly more.

Related Posts

Executing SSIS From Azure Data Factory

Andy Leonard shows us how to execute an SSIS package from Azure Data Factory: The good people who work on Azure Data Factory recently added an Execute SSIS Package activity. It’s pretty cool. Let’s tinker with it some, shall we? First, you will need to create an Azure Data Factory SSIS Integration Runtime. If you don’t know how, that’s […]

Read More

BCP And Multiple SQL Server Instances

Kevin Feasel

2018-06-11

ETL

Manoj Pandey investigates an interesting issue with BCP: I observed one thing here with BCP (Bulk Copy Program), when you have 2 versions of SQL Server installed on you PC or Server. I had SQL Server 2014 & 2016 installed on one of my DEV server. So if you are executing Query from SQL 2016 […]

Read More

Categories

June 2017
MTWTFSS
« May Jul »
 1234
567891011
12131415161718
19202122232425
2627282930