Databases Reference
In-Depth Information
computing the results of data mining algorithms. For example, the methods in
[54] discuss how to use to scalar dot product computation for frequent itemset
counting. The process of counting can also be achieved by using the secure
size of set intersection as described in [29]. Another method for association
rule mining discussed in [101] uses the secure scalar product over the vertical
bit representation of itemset inclusion in transactions, in order to compute the
frequency of the corresponding itemsets. This key step is applied repeatedly
within the framework of a roll up procedure of itemset counting. It has been
shown in [101] that this approach is quite effective in practice.
The approach of vertically partitioned mining has been extended to a vari-
ety of data mining applications such as decision trees [104], SVM Classification
[123], Naive Bayes Classifier [103], and k -means clustering [102]. A number
of theoretical results on the ability to learn different kinds of functions in
vertically partitioned databases with the use of cryptographic approaches are
discussed in [39].
4.3 Distributed Algorithms for k -Anonymity
In many cases, it is important to maintain k -anonymity across different dis-
tributed parties. In [56], a k -anonymous protocol for data which is vertically
partitioned across two parties is described. The broad idea is for the two
parties to agree on the quasi-identifier to generalize to the same value before
release. A similar approach is discussed in [110], in which the two parties agree
on how the generalization is to be performed before release.
In [125], an approach has been discussed for the case of horizontally par-
titioned data. The work in [125] discusses an extreme case in which each site
is a customer which owns exactly one tuple from the data. It is assumed that
the data record has both sensitive attributes and quasi-identifier attributes.
The solution uses encryption on the sensitive attributes. The sensitive values
can be decrypted only if therefore are at least k records with the same values
on the quasi-identifiers. Thus, k -anonymity is maintained.
The issue of k -anonymity is also important in the context of hiding iden-
tification in the context of distributed location based services [17, 48]. In this
case, k -anonymity of the user-identity is maintained even when the location
information is released. Such location information is often released when a
user may send a message at any point from a given location.
A similar issue arises in the context of communication protocols in which
the anonymity of senders (or receivers) may need to be protected. A message is
said to be sender k-anonymous , if it is guaranteed that an attacker can at most
narrow down the identity of the sender to k individuals. Similarly, a message
is said to be receiver k-anonymous , if it is guaranteed that an attacker can at
most narrow down the identity of the receiver to k individuals. A number of
such techniques have been discussed in [52, 116, 119].
Search WWH ::




Custom Search