Databases Reference
In-Depth Information
l -diversity problem because especially challenging. Methods have been pro-
posed in [77] for constructing l -diverse tables from the data set, though the
technique remains susceptible to the curse of dimensionality [7]. Other meth-
ods for creating l -diverse tables are discussed in [115], in which a simple and
ecient method for constructing the l -diverse representation is proposed.
4 Distributed Privacy-Preserving Data Mining
The key goal in most distributed methods for privacy-preserving data min-
ing is to allow computation of useful aggregate statistics over the entire data
set without compromising the privacy of the individual data sets within the
different participants. Thus, the participants may wish to collaborate in ob-
taining aggregate results, but may not fully trust each other in terms of the
distribution of their own data sets. For this purpose, the data sets may either
be horizontally partitioned or be vertically partitioned . In horizontally parti-
tioned data sets, the individual records are spread out across multiple entities,
each of which have the same set of attributes. In vertical partitioning, the in-
dividual entities may have different attributes (or views) of the same set of
records. Both kinds of partitioning pose different challenges to the problem of
distributed privacy-preserving data mining.
The problem of distributed privacy-preserving data mining overlaps closely
with a field in cryptography for determining secure multi-party computations.
A broad overview of the intersection between the fields of cryptography and
privacy-preserving data mining may be found in [90]. The broad approach
to cryptographic methods tends to compute functions over inputs provided
by multiple recipients without actually sharing the inputs with one another.
For example, in a 2-party setting, Alice and Bob may have two inputs x and
y respectively, and may wish to both compute the function f ( x, y ) without
revealing x or y to each other. This problem can also be generalized across k
parties by designing the k argument function h ( x 1 ...x k ). Many data mining
algorithms may be viewed in the context of repetitive computations of many
such primitive functions such as the scalar dot product, secure sum etc. In
order to compute the function f ( x, y )or h ( x 1 ...,x k ), a protocol will have
to designed for exchanging information in such a way that the function is
computed without compromising privacy. We note that the robustness of the
protocol depends upon the level of trust one is willing to place on the two
participants Alice and Bob. This is because the protocol may be subjected to
various kinds of adversarial behavior:
Semi-honest Adversaries: In this case, the participants Alice and Bob
are curious and attempt to learn from the information received by them
during the protocol, but do not deviate from the protocol themselves. In
many situations, this may be considered a realistic model of adversarial
behavior.
Search WWH ::




Custom Search