Ninghui Li Wins ICDE 2017 Test of Time Award
Writer(s): Kristyn ChildresProfessor Ninghui Li won the ICDE 2017 Test of Time (10-Year Award) for his 2007 paper “t-Closeness: Privacy Beyond k-Anonymity and ℓ-Diversity,” written with Tiancheng Li (Ph.D. ’10) and Suresh Venkatasubramianian. They received the award at the IEEE International Conference on Data Engineering (ICDE) in San Diego.
Agencies and other organizations often need to publish microdata, such as medical or census data, for research purposes. Such data is stored in a table where each record has a number of attributes (like name, social security number, and date of birth) that could identify the person it relates to either directly or indirectly. Some of the data (such as disease or salary) is often sensitive, and it’s necessary for researchers to minimize the risk of such data being released.
To keep data private, explicit identifiers (like name and social security number) are removed. Then, quasi-identifiers (like address and birth date) are replaced with values that are less-specific but semantically consistent through a process called generalization. As a result, more records will have the same set of quasi-identifier values.
Previous work has shown that k-anonymity (a privacy requirement for publishing microdata which states that any set of records that are indistinguishable from each other with respect to certain identifying attributes must contain at least k records) is flawed. While k-anonymity protects against a patient’s identity being disclosed, it does not provide sufficient protection against the disclosure of other sensitive attributes.
In order to address the flaws of k-anonymity, other researchers have proposed a notion called ℓ-diversity, which requires that each equivalence class has at least ℓ well-represented values for each sensitive attribute.
In this paper, the authors show that ℓ-diversity also has a number of limitations. They then propose a new privacy notion called t-closeness, which requires that the distribution of a sensitive attribute in any equivalence class is close to the distribution of the attribute in the overall table (i.e., the distance between the two distributions should be no more than a threshold t).
Li’s paper has been cited more than 1,800 times. Read the full text of the paper here.