Next: Possible Solutions
Up: Security and Privacy
Previous: Introduction
The problems of providing access to certain types of information,
while restricting access to other types of information (access
control) have been studied extensively by the database security
community. In particular, multilevel secure databases [ST90]
have been developed as a means of enforcing access control policies
that limit sharing of information to those with appropriate authority.
However, the adequacy of such controls has always been suspect due to
the ``inference problem'': How do we ensure that we cannot infer
``private'' data from ``public'' data? Data mining raises the risk factor
in such situations, since it allows for the automation of this
inference process. There has been progress in this area
[Marar], but data mining also poses a new problem: The
``inferences'' we find are not specific. That is, a rule ``A implies B''
with confidence 25% may be damaging from a business point of view, but
since it is not strictly true it would not appear to be a problem to
existing inference techniques.
The secure database community has developed techniques that may help
us to control undesired mining problems. An integral part of the data
mining procedure is the accumulation of a sufficiently large quantity
of information so that generalizations, or rules may be
postulated. This ``aggregation problem'' has been studied most recently
in [MMJ94]. Aggregation control techniques can help us to limit
access to data, but we are still faced with the problem of determining
when an aggregate is too large. In other words, even if we know how to
control access to the data, we do not know how much access is too
much.
One area that looks like a similar problem is disclosure control
in statistical databases: Allowing users to gather general statistics
on data, while preventing access to individual items (e.g., preventing
access to data on individuals from queries over census data
[oSM94]). Our concern is exactly the
opposite: We want to allow access to the individuals, but prevent
generalizations on the whole. However, disclosure control provides a
good model of what we would like to see: A strong notion of ``how much
is too much'' when allowing statistical queries.
Next: Possible Solutions
Up: Security and Privacy
Previous: Introduction
Christopher W Clifton
Fri Aug 23 13:26:29 EDT 1996