Press "Enter" to skip to content

Using Delimiters when Performing Change Detection with Hashbytes

Andy Leonard clues us in on an important character:

A student of my SSIS training asked why I place vertical pipe delimiters between column values when calculating the hash value for a row. I sent back a quick response and then informed him that he’d inspired a blog post.

Hence, this blog post.

To understand what I am talking about, you need to know that I define a 100% efficient data integration operation as a data integration process – such as an SSIS package or ADF pipeline – that loads only new and changed rows from a source. I explain this in the video. To achieve 100% efficiency, I use the T-SQL HashBytes function for change detection.

Check out the full reasoning. A secondary reason for using delimiters is that if you’re using the CONCAT() function to combine rather than calling CAST() or CONVERT() on everything individually, CONCAT() requires at least two inputs. If you always put a delimiter at the end as well, you have at least two inputs, guaranteed. It’s a small thing, but adds to consistency when, say, you only have one non-key, type 2 field on a dimension.