Privacy-Preserving Data Mining

Duncan Greaves gives us a few options for mining data while maintaining user anonymity:

In pseudonymisation, matching data sets at individual row level is done using key fields, which are then pseudonymised for consumption. Candidates for key fields include those combinations that are most often used to match the datasets, e.g. DoB/Gender/Postcode, credit card numbers, IP addresses or email identifiers. Allocation of persistent pseudonyms are used to build up profiles over time to allow data mining to happen in a privacy sensitive way.

All methods for privacy aware data mining carry additional complexity associated with creating pools of data from which secondary use can be made, without compromising the identity of the individuals who provided the data. Pseudonymisation can act as the best compromise between full anonymisation and identity in many scenarios where it is essential that the identity is preserved, whilst minimising the risks of re-identification beyond reasonable means.

Read the whole thing.

Related Posts

Backing Up SSRS Encryption Keys

Jonathan Kehayias reminds us to back up those SQL Server Reporting Services encryption keys: If you run SQL Server Reporting Services, part of your DR plan needs to include a backup of the encryption key for SSRS. This sadly is an all to often overlooked part of the solution, even though it is incredibly easy […]

Read More

L-Diversity versus K-Anonymity

Duncan Greaves explains the concepts behind l-diversity: There are problems with K-anonymous datasets, namely the homogeneous pattern attack, and the background knowledge attack, details of which are in my original post. A slightly different approach to anonymising public datasets comes in the form of ℓ -diversity, a way of introducing further entropy/diversity into a dataset. […]

Read More

Categories

June 2018
MTWTFSS
« May Jul »
 123
45678910
11121314151617
18192021222324
252627282930