Database Reference
In-Depth Information
compute the scalar product of their two vectors : A B. A may encrypt the data with
homomorphic encryption h and compute
ha ha h , and send those
to B . B will then multiply each of these encryptions by his b i , but only for those i
for which b i
( ,( , ,( )
n
1
2
0, and due to the property of homomorphic encryption that h(x)
×
h(y)
eAB
(
++
...
A B
)
mm
= h(x+y) , B obtains
. This expression is then passed
back to A for decoding of the result, which is A B (we have omitted some details
here, having to do with the use of digital envelopes and with doing all the arith-
metic operations modulo and agreed X ). This is generalized from two to multiple
parties, and from dealing with binary values to any symbolic or numerical values.
Furthermore, complex protocols are designed to implement in the above manner
many data mining operations: different classifiers, e.g. decision trees (Vaidya,
Clifton et al. 2008), support vector machines (Zhan 2007), Bayesian classifiers
(Yang and Wright 2006), k -nearest neighbour (Zhan, Chang et al. 2005), cluster-
ing (Vaidya and Clifton 2003), etc., in both horizontal and vertical partitioning
settings.
A big advantage of these approaches is that data remains unchanged, and there-
fore there is no loss of data quality. A disadvantage is the significant overhead,
due to the multiple encryption/decryption operations, and even more importantly
to the need to communicate securely between the parties to exchange the keys
for these encryptions. (Yang, Wright et al. 2006) presents an empirical study that
analyzes the computational and communication overhead of the cryptographic ap-
proaches outlined above, and shows that the computational overhead is quite
heavy: vector product for vector of size O(10**5) was exceeding 1 hour (although
specialized optimization would decrease this to seconds). For these reasons that
there are no reported large-scale implementations and applications of these ap-
proaches with real-life data.
A recent important cryptographic result by (Gentry 2010) generalizes the con-
cept of cryptographic encryption to the functionality of computation of any type
on encrypted data, and then decrypting the result. While there is ongoing research
to build an efficient (or even feasible) implementation of these results, it is clear
that even if only partially successful, the approach will go a long way towards en-
suring data security in a cloud environment. It will also support data privacy in the
distributed context that we are discussing here.
j
j
j
j
1
1
11.5 New Challenges for Data Privacy
As connectivity becomes ubiquitous and computing technology permeates human
life, more and more data is produced as people go about their daily life. Mining of
this data can result in novel and unexpected threats to privacy.
Data from mobile devices may represent one such threat (Wang, Pedreschi et
al. 2011). Collecting this kind of data may result in highly useful services. For in-
stance, collecting data from vehicles' or persons' trajectories in large cities may
provide traffic and urban planners with patterns that can be used to build new
roads, manage traffic, introduce corrective highway tariffs etc. At the same time,
Search WWH ::




Custom Search