Database technology provides a number of advantages. Data mining is one of these; using automated tools to analyze corporate data can help find ways to increase efficiency of an organization.
Another advantage of database technology is information sharing (including sharing with other organizations). For example, publicly accessible corporate telephone books can decrease the need for telephone operators (offloading this task to the caller...) Sharing need not be completely public making inventory information available to suppliers can help a retail operation to avoid shortages, and can lower the supplier's cost (thus allowing the retailer to negotiate a better price).
These two advantages, when combined, can become a disadvantage. For example, mining a corporate directory to determine staffing of a particular project (and changes in staffing) could help a competitor to determine product rollout dates, allowing preemptive marketing campaigns. Mining retailer inventory data could allow a supplier to determine sales and supplies of competing products, leading to pricing and marketing strategies aimed at reducing the competition (this would be unlikely to be in the best interest of the retailer providing the data).
We will describe this with a more detailed example. Let us suppose that we (as purchasing directors of BigMart, a large supermarket chain) are negotiating a deal with the Dedtrees paper company. They offer to give us a reduced price if we give them access to our database of customer purchases (they say this will allow them to track inventory to allow ``just-in-time'' production and stocking and reduce their warehouse costs). We accept this deal.
Unbeknownst to us, Dedtrees now starts mining our data. Using a tool for mining sequences [AFS93], they find that customers who purchase cold remedies later purchase facial tissue (allowing them to stock us in advance). They also find (using association rule mining [AIS93]) that people who purchase skim milk also purchase Green paper. Dedtrees now runs a coupon marketing campaign ``50 cents off skim milk when you buy Dedtrees products'', cutting heavily into the of sales Green paper, who increase prices to us based on the lower sales. When we next go to negotiate with Dedtrees, we find that with reduced competition, they are unwilling to offer us as low a price, and we start to lose business to our competitors (who were able to negotiate a better deal with Green paper).
What are we to do about this? Providing Dedtrees access to our database is good (it improves the efficiency of distribution, lowering costs). Allowing them to mine this data can be good (it helps them to predict inventory needs), but it can also be bad (it allows them to market so as to reduce competition). Some ideas are:
Alternatively, we may want to prevent mining altogether (such as on a corporate telephone directory). We want people to be able to retrieve individual facts, but we want to protect any generalizations that can be formed from mining the data.
In this paper we discuss some specific possibilities; things that can be done to prevent undesirable mining of published data. Section 4 presents directions for research that will support these ends (both direct research on this problem, and work in data mining that can support work on data mining prevention). We will first give an overview of related work that may help to outline this research area.