Press "Enter" to skip to content

CHECKSUM And BINARY_CHECKSUM Collisions

Jovan Popovic shows us how fragile CHECKSUM and BINARY_CHECKSUM are:

You can find patterns of inputs that return same values. This is similar to modulo operator that returns the same remainder if you find the right pattern of numbers.

As an example, if you run following queries you will  find that these functions return 0 for all strings containing only letter ‘a’ that have lengths 32, 64, 96, n * 32 characters:

This isn’t the only place where you can get data collisions with these two functions; because they return ints, they’re limited to 8 bits.  Jeff Preshing has a great chart on hash collisions, showing that if you have 77,163 separate values, there’s a 50% chance two will collide.

Use HASHBYTES instead.  The likelihood of MD5 collisions is pretty low (needing 609 million records before a 1% chance of collision), SHA1 even lower (1.71 x 10^23 records before a 1% chance of collision), and SHA2 256 or 512 ridiculously low.