Privacy-Preserving Data Mining: A Survey - Database Security: Applications and Trends

Databases Reference

In-Depth Information

large so that individual record values cannot be recovered. Therefore, tech-

niques are designed to derive aggregate distributions from the perturbed

records. Subsequently, data mining techniques can be developed in order

to work with these aggregate distributions. We will describe the random-

ization technique in greater detail in a later section.

•

The k-anonymity model and l-diversity: The k -anonymity model was de-

veloped because of the possibility of indirect identification of records from

public databases. This is because combinations of record attributes can be

used to exactly identify individual records. In the k -anonymity method, we

reduce the granularity of data representation with the use of techniques

such as generalization and suppression. This granularity is reduced suf-

ficiently that any given record maps onto at least k other records in the

data. The l -diversity model was designed to handle some weaknesses in the

k -anonymity model since protecting identities to the level of k -individuals

is not the same as protecting the corresponding sensitive values, especially

when there is homogeneity of sensitive values within a group. To do so,

the concept of intra-group diversity of sensitive values is promoted within

the anonymization scheme [77].

•

Distributed privacy preservation: In many cases, individual entities may

wish to derive aggregate results from data sets which are partitioned across

these entities. Such partitioning may be horizontal (when the records are

distributed across multiple entities) or vertical (when the attributes are

distributed across multiple entities). While the individual entities may

not desire to share their entire data sets, they may consent to limited

information sharing with the use of a variety of protocols. The overall

effect of such methods is to maintain privacy for each individual entity,

while deriving aggregate results over the entire data.

•

Downgrading Application Effectiveness: In many cases, even though the

data may not be available, the output of applications such as association

rule mining, classification or query processing may result in violations of

privacy. This has lead to research in downgrading the effectiveness of ap-

plications by either data or application modifications. Some examples of

such techniques include association rule hiding [106], classifier downgrad-

ing [83], and query auditing [1].

In this paper, we will provide a broad overview of the different techniques

for privacy-preserving data mining. We will provide a review of the major

algorithms available for each method, and the variations on the different tech-

niques. We will also discuss a number of combinations of different concepts

such as k -anonymous mining over vertically- or horizontally-partitioned data.

We will also discuss a number of unique challenges associated with privacy-

preserving data mining in the high dimensional case.

This paper is organized as follows. In section 2, we will introduce the ran-

domization method for privacy preserving data mining. In section 3, we will

discuss the k -anonymization method along with its different variations. In sec-

Database Security: Applications and Trends

Search WWH ::

Custom Search

Home