t-closeness And Data Anonymity

Kevin Feasel

2018-11-02

Data

John Cook shares some thoughts about k-anonymity and t-closeness:

The idea of k-anonymity is that every database record appears at least k times. If you have a lot of records and few fields, your value of k could be high. But as you get more fields, it becomes more likely that a combination of fields is unique. If k = 1, then k-anonymity offers no anonymity.

Another problem with k-anonymity is that it doesn’t offer group privacy. A database could be k-anonymous but reveal information about a group if that group is homogeneous with respect to some field. That is, the method is subject to a homogeneity attack.

This is intended to be a “get you thinking” type of post, and John does have links to related posts which flesh things out a bit more.

Related Posts

Null Checks in Spark DataFrames

Bipin Patwardhan gives us four techniques for validating whether data in Spark exists: The task at hand was pretty simple — we wanted to create a flexible and reusable library of classes that would make the task of data validation (over Spark DataFrames) a breeze. In this article, I will cover a couple of techniques/idioms […]

Read More

“Big” Data

Buck Woody explains that “Big Data” is just data: A few years ago it was all the rage to talk about “Big Data”. Lots of descriptions of “Big Data” popped up, including the “V’s” (Variety, Velocity, Volume, etc.) that proved very helpful. I even have my own definition: Big Data is any data you can’t […]

Read More

Categories

November 2018
MTWTFSS
« Oct Dec »
 1234
567891011
12131415161718
19202122232425
2627282930