Privacy-Preserving Data Mining

Duncan Greaves gives us a few options for mining data while maintaining user anonymity:

In pseudonymisation, matching data sets at individual row level is done using key fields, which are then pseudonymised for consumption. Candidates for key fields include those combinations that are most often used to match the datasets, e.g. DoB/Gender/Postcode, credit card numbers, IP addresses or email identifiers. Allocation of persistent pseudonyms are used to build up profiles over time to allow data mining to happen in a privacy sensitive way.

All methods for privacy aware data mining carry additional complexity associated with creating pools of data from which secondary use can be made, without compromising the identity of the individuals who provided the data. Pseudonymisation can act as the best compromise between full anonymisation and identity in many scenarios where it is essential that the identity is preserved, whilst minimising the risks of re-identification beyond reasonable means.

Read the whole thing.

Related Posts

Preventing Credential Compromise When Using AWS

Will Bengtston walks us through techniques Netflix uses to protect credentials in AWS: Scope In this post, we’ll discuss how to prevent or mitigate compromise of credentials due to certain classes of vulnerabilities such as Server Side Request Forgery (SSRF) and XML External Entity (XXE) injection. If an attacker has remote code execution (RCE) or […]

Read More

Cross-Availability Group Login Management

David Fowler walks us through a problem about orphaned users and Availability Groups: Now, I’m pretty sure that most of us will have been in the position where, after a fail-over we get inundated with calls, emails, Skype messages and carrier pigeon drops letting us know that so and so can no longer access the […]

Read More

Categories

June 2018
MTWTFSS
« May Jul »
 123
45678910
11121314151617
18192021222324
252627282930