Databases Reference
In-Depth Information
18
Privacy-Preserving Data Mining: A Survey
Charu C. Aggarwal and Philip S. Yu
IBM T. J. Watson Research Center
19 Skyline Drive
Hawthorne, NY 10532
{ charu,psyu } @us.ibm.com
Summary. In recent years, privacy-preserving data mining has been studied ex-
tensively, because of the wide proliferation of sensitive information on the internet.
A number of algorithmic techniques have been designed for privacy-preserving data
mining. In this paper, we provide a review of the state-of-the-art methods for privacy.
We discuss methods for randomization, k -anonymization, and distributed privacy-
preserving data mining. We also discuss cases in which the output of data mining
applications needs to be sanitized for privacy-preservation purposes. We discuss the
computational and theoretical limits associated with privacy-preservation over high
dimensional data sets.
1 Introduction
In recent years, data mining has been viewed as a threat to privacy because
of the widespread proliferation of electronic data maintained by corporations.
This has lead to increased concerns about the privacy of the underlying data.
In recent years, a number of techniques have been proposed for modifying or
transforming the data in such a way so as to preserve privacy. A survey on
some of the techniques used for privacy-preserving data mining may be found
in [105]. In this chapter, we will study an overview of the state-of-the-art in
privacy-preserving data mining.
Most methods for privacy computations use some form of transformation
on the data in order to perform the privacy preservation. Typically, such
methods reduce the granularity of representation in order to reduce the pri-
vacy. This reduction in granularity results in some loss of effectiveness of data
management or mining algorithms. This is the natural trade-off between in-
formation loss and privacy. Some examples of such techniques are as follows:
The randomization method: The randomization method is a technique for
privacy-preserving data mining in which noise is added to the data in order
to mask the attribute values of records [2, 5]. The noise added is su ciently
Search WWH ::




Custom Search