Databases Reference
In-Depth Information
tion. A second set of methods for distributed privacy-preserving data mining
is discussed in [29] in which the secure multi-party computation of a number
of important data mining primitives is discussed. These methods include the
secure sum, the secure set union, the secure size of set intersection and the
scalar product. These techniques can be used as data mining primitives for
secure multi-party computation over a variety of horizontally and vertically
partitioned data sets. Next, we will discuss algorithms for secure multi-party
computation over horizontally partitioned data sets.
4.1 Distributed Algorithms over Horizontally Partitioned Data
Sets
In horizontally partitioned data sets, different sites contain different sets of
records with the same (or highly overlapping) set of attributes which are
used for mining purposes. Many of these techniques use specialized versions
of the general methods discussed in [29, 36] for various problems. The work
in [74] discusses the construction of a popular decision tree induction method
called ID3 with the use of approximations of the best splitting attributes.
Subsequently, a variety of classifiers have been generalized to the problem of
horizontally-partitioned privacy preserving mining including the Naive Bayes
Classifier [61], and the SVM Classifier with nonlinear kernels [122]. An ex-
treme solution for the horizontally partitioned case is discussed in [120], in
which privacy-preserving classification is performed in a fully distributed set-
ting, where each customer has private access to only their own record. A host
of other data mining applications have been generalized to the problem of hor-
izontally partitioned data sets. These include the applications of association
rule mining [60], clustering [53, 58, 59] and collaborative filtering [92]. Meth-
ods for cooperative statistical analysis using secure multi-party computation
methods are discussed in [37, 38].
A related problem is that of information retrieval and document index-
ing in a network of content providers. This problem arises in the context of
multiple providers which may need to cooperate with one another in sharing
their content, but may essentially be business competitors. In [15], it has been
discussed how an adversary may use the output of search engines and con-
tent providers in order to reconstruct the documents. Therefore, the level of
trust required grows with the number of content providers. A solution to this
problem [15] constructs a centralized privacy-preserving index in conjunction
with a distributed access control mechanism. The privacy-preserving index
maintains strong privacy guarantees even in the face of colluding adversaries,
and even if the entire index is made public.
4.2 Distributed Algorithms over Vertically Partitioned Data
For the vertically partitioned case, many primitive operations such as com-
puting the scalar product or the secure set size intersection can be useful in
Search WWH ::




Custom Search