Privacy-Preserving Data Mining: A Survey - Database Security: Applications and Trends

Databases Reference

In-Depth Information

tion. A second set of methods for distributed privacy-preserving data mining

is discussed in [29] in which the secure multi-party computation of a number

of important data mining primitives is discussed. These methods include the

secure sum, the secure set union, the secure size of set intersection and the

scalar product. These techniques can be used as data mining primitives for

secure multi-party computation over a variety of horizontally and vertically

partitioned data sets. Next, we will discuss algorithms for secure multi-party

computation over horizontally partitioned data sets.

4.1 Distributed Algorithms over Horizontally Partitioned Data

Sets

In horizontally partitioned data sets, different sites contain different sets of

records with the same (or highly overlapping) set of attributes which are

used for mining purposes. Many of these techniques use specialized versions

of the general methods discussed in [29, 36] for various problems. The work

in [74] discusses the construction of a popular decision tree induction method

called ID3 with the use of approximations of the best splitting attributes.

Subsequently, a variety of classifiers have been generalized to the problem of

horizontally-partitioned privacy preserving mining including the Naive Bayes

Classifier [61], and the SVM Classifier with nonlinear kernels [122]. An ex-

treme solution for the horizontally partitioned case is discussed in [120], in

which privacy-preserving classification is performed in a fully distributed set-

ting, where each customer has private access to only their own record. A host

of other data mining applications have been generalized to the problem of hor-

izontally partitioned data sets. These include the applications of association

rule mining [60], clustering [53, 58, 59] and collaborative filtering [92]. Meth-

ods for cooperative statistical analysis using secure multi-party computation

methods are discussed in [37, 38].

A related problem is that of information retrieval and document index-

ing in a network of content providers. This problem arises in the context of

multiple providers which may need to cooperate with one another in sharing

their content, but may essentially be business competitors. In [15], it has been

discussed how an adversary may use the output of search engines and con-

tent providers in order to reconstruct the documents. Therefore, the level of

trust required grows with the number of content providers. A solution to this

problem [15] constructs a centralized privacy-preserving index in conjunction

with a distributed access control mechanism. The privacy-preserving index

maintains strong privacy guarantees even in the face of colluding adversaries,

and even if the entire index is made public.

4.2 Distributed Algorithms over Vertically Partitioned Data

For the vertically partitioned case, many primitive operations such as com-

puting the scalar product or the secure set size intersection can be useful in

Search WWH ::

Custom Search

Home