How to Prevent Private Data from being Disclosed to a Malicious Attacker - Data Mining: Foundations and Practice

Databases Reference

In-Depth Information

and information sharing to support coalition in which different organizations

and nations must share some, but not all, information. Information privacy

thus becomes extremely important: all the parties of the collaboration promise

to provide their private data to the collaboration, but neither of them wants

each other or any other party to learn much about their private data.

Without privacy concerns, all parties can send their data to a trusted

central place to conduct the mining. However, in situations with privacy con-

cerns, the parties may not trust anyone. We call this type of problem the

Privacy-preserving Collaborative Data Mining problem . As stated above, in

this paper we are interested in homogeneous collaboration where each party

has the same sets of attributes [15] but has different sets of instances.

Data mining includes a number of different tasks, such as association rule

mining, classification, and clustering, etc. This paper studies how to learn sup-

port vector machines. In the last few years, there has been a surge of interest

in Support Vector Machines (SVM) [28, 29]. SVM is a powerful methodol-

ogy for solving problems in nonlinear classification, function estimation and

density estimation which has also led to many other recent developments in

kernel based learning methods in general [7, 24, 25]. SVMs have been intro-

duced within the context of statistical learning theory and structural risk

minimization. As part of the SVM algorithm, one solves convex optimization

problems, typically quadratic programs. It has been empirically shown that

SVMs have good generalization performance on many applications such as

text categorization [13], face detection [20], and handwritten character recog-

nition [16]. Based on the existing SVM learning technologies, we study the

problem of learning Support Vector Machines on private data. More precisely,

the problem is defined as follows: multiple parties want to build support vector

machines on a data set that consists of private data of all the parties, but none

of the parties is willing to disclose her raw data to each other or any other

parties. We develop a secure protocol, based on homomorphic cryptography

and random perturbation techniques, to tackle the problem. An important

feature of our approach is its distributed character, i.e. there is no single,

centralized authority that all parties need to trust. Instead, the computation

is distributed among parties, and its structure and the use of homomorphic

encryption ensures privacy of the data.

The paper is organized as follows: The related work is discussed in Sect. 2.

We describe the SVMs training procedure in Sect. 3. We then present our

proposed secure protocols in Sect. 4. We give our conclusion in Sect. 5.

2 Related Work

2.1 Secure Multi-Party Computation

A Secure Multi-party Computation (SMC) problem deals with computing any

function on any input, in a distributed network where each participant holds

one of the inputs, while ensuring that no more information is revealed to a

Search WWH ::

Custom Search

Home