Privacy-Preserving Data Mining: A Survey - Database Security: Applications and Trends

Databases Reference

In-Depth Information

l -diversity problem because especially challenging. Methods have been pro-

posed in [77] for constructing l -diverse tables from the data set, though the

technique remains susceptible to the curse of dimensionality [7]. Other meth-

ods for creating l -diverse tables are discussed in [115], in which a simple and

ecient method for constructing the l -diverse representation is proposed.

4 Distributed Privacy-Preserving Data Mining

The key goal in most distributed methods for privacy-preserving data min-

ing is to allow computation of useful aggregate statistics over the entire data

set without compromising the privacy of the individual data sets within the

different participants. Thus, the participants may wish to collaborate in ob-

taining aggregate results, but may not fully trust each other in terms of the

distribution of their own data sets. For this purpose, the data sets may either

be horizontally partitioned or be vertically partitioned . In horizontally parti-

tioned data sets, the individual records are spread out across multiple entities,

each of which have the same set of attributes. In vertical partitioning, the in-

dividual entities may have different attributes (or views) of the same set of

records. Both kinds of partitioning pose different challenges to the problem of

distributed privacy-preserving data mining.

The problem of distributed privacy-preserving data mining overlaps closely

with a field in cryptography for determining secure multi-party computations.

A broad overview of the intersection between the fields of cryptography and

privacy-preserving data mining may be found in [90]. The broad approach

to cryptographic methods tends to compute functions over inputs provided

by multiple recipients without actually sharing the inputs with one another.

For example, in a 2-party setting, Alice and Bob may have two inputs x and

y respectively, and may wish to both compute the function f ( x, y ) without

revealing x or y to each other. This problem can also be generalized across k

parties by designing the k argument function h ( x 1 ...x k ). Many data mining

algorithms may be viewed in the context of repetitive computations of many

such primitive functions such as the scalar dot product, secure sum etc. In

order to compute the function f ( x, y )or h ( x 1 ...,x k ), a protocol will have

to designed for exchanging information in such a way that the function is

computed without compromising privacy. We note that the robustness of the

protocol depends upon the level of trust one is willing to place on the two

participants Alice and Bob. This is because the protocol may be subjected to

various kinds of adversarial behavior:

•

Semi-honest Adversaries: In this case, the participants Alice and Bob

are curious and attempt to learn from the information received by them

during the protocol, but do not deviate from the protocol themselves. In

many situations, this may be considered a realistic model of adversarial

behavior.

Search WWH ::

Custom Search

Home