Privacy-Preserving Data Mining: A Survey - Database Security: Applications and Trends

Databases Reference

In-Depth Information

computing the results of data mining algorithms. For example, the methods in

[54] discuss how to use to scalar dot product computation for frequent itemset

counting. The process of counting can also be achieved by using the secure

size of set intersection as described in [29]. Another method for association

rule mining discussed in [101] uses the secure scalar product over the vertical

bit representation of itemset inclusion in transactions, in order to compute the

frequency of the corresponding itemsets. This key step is applied repeatedly

within the framework of a roll up procedure of itemset counting. It has been

shown in [101] that this approach is quite effective in practice.

The approach of vertically partitioned mining has been extended to a vari-

ety of data mining applications such as decision trees [104], SVM Classification

[123], Naive Bayes Classifier [103], and k -means clustering [102]. A number

of theoretical results on the ability to learn different kinds of functions in

vertically partitioned databases with the use of cryptographic approaches are

discussed in [39].

4.3 Distributed Algorithms for k -Anonymity

In many cases, it is important to maintain k -anonymity across different dis-

tributed parties. In [56], a k -anonymous protocol for data which is vertically

partitioned across two parties is described. The broad idea is for the two

parties to agree on the quasi-identifier to generalize to the same value before

release. A similar approach is discussed in [110], in which the two parties agree

on how the generalization is to be performed before release.

In [125], an approach has been discussed for the case of horizontally par-

titioned data. The work in [125] discusses an extreme case in which each site

is a customer which owns exactly one tuple from the data. It is assumed that

the data record has both sensitive attributes and quasi-identifier attributes.

The solution uses encryption on the sensitive attributes. The sensitive values

can be decrypted only if therefore are at least k records with the same values

on the quasi-identifiers. Thus, k -anonymity is maintained.

The issue of k -anonymity is also important in the context of hiding iden-

tification in the context of distributed location based services [17, 48]. In this

case, k -anonymity of the user-identity is maintained even when the location

information is released. Such location information is often released when a

user may send a message at any point from a given location.

A similar issue arises in the context of communication protocols in which

the anonymity of senders (or receivers) may need to be protected. A message is

said to be sender k-anonymous , if it is guaranteed that an attacker can at most

narrow down the identity of the sender to k individuals. Similarly, a message

is said to be receiver k-anonymous , if it is guaranteed that an attacker can at

most narrow down the identity of the receiver to k individuals. A number of

such techniques have been discussed in [52, 116, 119].

Search WWH ::

Custom Search

Home