Privacy-Preserving Data Mining: A Survey - Database Security: Applications and Trends

Databases Reference

In-Depth Information

18

Privacy-Preserving Data Mining: A Survey

Charu C. Aggarwal and Philip S. Yu

IBM T. J. Watson Research Center

19 Skyline Drive

Hawthorne, NY 10532

{ charu,psyu } @us.ibm.com

Summary. In recent years, privacy-preserving data mining has been studied ex-

tensively, because of the wide proliferation of sensitive information on the internet.

A number of algorithmic techniques have been designed for privacy-preserving data

mining. In this paper, we provide a review of the state-of-the-art methods for privacy.

We discuss methods for randomization, k -anonymization, and distributed privacy-

preserving data mining. We also discuss cases in which the output of data mining

applications needs to be sanitized for privacy-preservation purposes. We discuss the

computational and theoretical limits associated with privacy-preservation over high

dimensional data sets.

1 Introduction

In recent years, data mining has been viewed as a threat to privacy because

of the widespread proliferation of electronic data maintained by corporations.

This has lead to increased concerns about the privacy of the underlying data.

In recent years, a number of techniques have been proposed for modifying or

transforming the data in such a way so as to preserve privacy. A survey on

some of the techniques used for privacy-preserving data mining may be found

in [105]. In this chapter, we will study an overview of the state-of-the-art in

privacy-preserving data mining.

Most methods for privacy computations use some form of transformation

on the data in order to perform the privacy preservation. Typically, such

methods reduce the granularity of representation in order to reduce the pri-

vacy. This reduction in granularity results in some loss of effectiveness of data

management or mining algorithms. This is the natural trade-off between in-

formation loss and privacy. Some examples of such techniques are as follows:

•

The randomization method: The randomization method is a technique for

privacy-preserving data mining in which noise is added to the data in order

to mask the attribute values of records [2, 5]. The noise added is su ciently

Search WWH ::

Custom Search

Home