Databases Reference
In-Depth Information
and information sharing to support coalition in which different organizations
and nations must share some, but not all, information. Information privacy
thus becomes extremely important: all the parties of the collaboration promise
to provide their private data to the collaboration, but neither of them wants
each other or any other party to learn much about their private data.
Without privacy concerns, all parties can send their data to a trusted
central place to conduct the mining. However, in situations with privacy con-
cerns, the parties may not trust anyone. We call this type of problem the
Privacy-preserving Collaborative Data Mining problem . As stated above, in
this paper we are interested in homogeneous collaboration where each party
has the same sets of attributes [15] but has different sets of instances.
Data mining includes a number of different tasks, such as association rule
mining, classification, and clustering, etc. This paper studies how to learn sup-
port vector machines. In the last few years, there has been a surge of interest
in Support Vector Machines (SVM) [28, 29]. SVM is a powerful methodol-
ogy for solving problems in nonlinear classification, function estimation and
density estimation which has also led to many other recent developments in
kernel based learning methods in general [7, 24, 25]. SVMs have been intro-
duced within the context of statistical learning theory and structural risk
minimization. As part of the SVM algorithm, one solves convex optimization
problems, typically quadratic programs. It has been empirically shown that
SVMs have good generalization performance on many applications such as
text categorization [13], face detection [20], and handwritten character recog-
nition [16]. Based on the existing SVM learning technologies, we study the
problem of learning Support Vector Machines on private data. More precisely,
the problem is defined as follows: multiple parties want to build support vector
machines on a data set that consists of private data of all the parties, but none
of the parties is willing to disclose her raw data to each other or any other
parties. We develop a secure protocol, based on homomorphic cryptography
and random perturbation techniques, to tackle the problem. An important
feature of our approach is its distributed character, i.e. there is no single,
centralized authority that all parties need to trust. Instead, the computation
is distributed among parties, and its structure and the use of homomorphic
encryption ensures privacy of the data.
The paper is organized as follows: The related work is discussed in Sect. 2.
We describe the SVMs training procedure in Sect. 3. We then present our
proposed secure protocols in Sect. 4. We give our conclusion in Sect. 5.
2 Related Work
2.1 Secure Multi-Party Computation
A Secure Multi-party Computation (SMC) problem deals with computing any
function on any input, in a distributed network where each participant holds
one of the inputs, while ensuring that no more information is revealed to a
Search WWH ::




Custom Search