Next: Possible Solutions Up: Security and Privacy Previous: Introduction

Related Work

The problems of providing access to certain types of information, while restricting access to other types of information (access control) have been studied extensively by the database security community. In particular, multilevel secure databases [ST90] have been developed as a means of enforcing access control policies that limit sharing of information to those with appropriate authority. However, the adequacy of such controls has always been suspect due to the ``inference problem'': How do we ensure that we cannot infer ``private'' data from ``public'' data? Data mining raises the risk factor in such situations, since it allows for the automation of this inference process. There has been progress in this area [Marar], but data mining also poses a new problem: The ``inferences'' we find are not specific. That is, a rule ``A implies B'' with confidence 25% may be damaging from a business point of view, but since it is not strictly true it would not appear to be a problem to existing inference techniques.

The secure database community has developed techniques that may help us to control undesired mining problems. An integral part of the data mining procedure is the accumulation of a sufficiently large quantity of information so that generalizations, or rules may be postulated. This ``aggregation problem'' has been studied most recently in [MMJ94]. Aggregation control techniques can help us to limit access to data, but we are still faced with the problem of determining when an aggregate is too large. In other words, even if we know how to control access to the data, we do not know how much access is too much.

One area that looks like a similar problem is disclosure control in statistical databases: Allowing users to gather general statistics on data, while preventing access to individual items (e.g., preventing access to data on individuals from queries over census data [oSM94]). Our concern is exactly the opposite: We want to allow access to the individuals, but prevent generalizations on the whole. However, disclosure control provides a good model of what we would like to see: A strong notion of ``how much is too much'' when allowing statistical queries.

Next: Possible Solutions Up: Security and Privacy Previous: Introduction

Christopher W Clifton
Fri Aug 23 13:26:29 EDT 1996