Shuffling Data And Zipping Results In T-SQL

Kevin Feasel

2018-08-14

T-SQL

Phil Factor continues his series on pseudonymization:

The problems come with uncommon values. If you are pseudonymizing a medical database that is required for research purposes on people with potentially embarrassing diseases, and it appears on the dark web, anyone with a rare or unusual surname or first-name comes up on the list, so the shuffle doesn’t help the privacy of Fortescue Ceresole, or whatever his name may be.

If you are spoofing data entirely, you don’t necessarily have this problem because your constructed value will have no relationship to the original value. If it comes from a list of common names or if you randomly create a name ‘Thomas’, it will have no relationship to the original names in the database as long as you did things correctly and shuffle the list. Although a Markov string can produce an identical name that is uncommon, it can be eliminated from the list by an outer join with the original data.

After you shuffle data, you ‘zip’ it. Zipping lists is something you come across in procedural programming, and Linq has a good example. A .net array has an order, and all you are doing is to join by the order of the element in the list. If you randomize that order, you get a shuffle.

Read on for an example using the AdventureWorks Person.Person table.

Related Posts

Trailing Spaces and String Comparisons

Bert Wagner shows how SQL Server handles trailing spaces when comparing two strings: The LEN() function shows the number of characters in our string, while the DATALENGTH() function shows us the number of bytes used by that string. In this case, DATALENGTH is equal to 10. This result is due to the padded spaces occurring […]

Read More

T-SQL Bugs with Joins

Itzik Ben-Gan takes us through four bugs or oddities around joins: The order counts are now correct, but the total freight values are not. Can you spot the new bug? The new bug is more elusive because it manifests itself only when the same customer has at least one case where multiple orders happen to […]

Read More

Categories

August 2018
MTWTFSS
« Jul Sep »
 12345
6789101112
13141516171819
20212223242526
2728293031