Data mining to detect inference problems

Next: Conclusions Up: Further Work Previous: Theory of data

Data mining to detect inference problems

Given a database containing both ``public'' information and ``sensitive'' information, we can use data mining to search for inference paths from the public to the sensitive information. In other words, a rule with antecedents in the public information and consequents in the sensitive information points to a potential problem. Clustering the public information could conceivably bring together the antecedents. This by itself does not pose a problem, but knowing sensitive information about one of the instances could then be conjectured (correctly) to apply to all of them (greatly expanding the ``leak'' of sensitive information).

Proper use of data mining technology can be used to detect such situations. We can then apply techniques from the previous section to ensure that the data cannot be clustered in meaningful ways.

Our introductory example of mining supermarket sales shows how this may work. We may decide to protect brand information; only releasing sales by product type. However, Dedtrees may know that Green Paper dominates a particular product type, and can use data mining combined with this fact to find ``brand affinity'' information. Preemptive use of data mining (to deduce brand from the information we intend to release) would show this => relationship, allowing us to rethink our information release strategy.

Christopher W Clifton
Fri Aug 23 13:26:29 EDT 1996