Privacy-Preserving Data Mining: A Survey - Database Security: Applications and Trends

Databases Reference

In-Depth Information

be used for approximate query answering with the use of standard histogram

query processing methods. In [51], a method has been proposed for privacy-

preserving indexing of multi-dimensional data by using bucketizing of the

underlying attribute values in conjunction with encryption of identification

keys. We note that a choice of larger bucket sizes provides greater privacy but

less accuracy. Similarly, optimizing the bucket sizes for accuracy can lead to

reductions in privacy. This tradeoff has been studied in [51], and it has been

shown that reasonable query precision can be maintained at the expense of

partial disclosure.

In the class of methods which use summarization structures for inference

control, an interesting method was proposed by Mishra and Sandler in [81],

which uses pseudo-random sketches for privacy-preservation. In this technique

sketches are constructed from the data, and the sketch representations are

used to respond to user queries. In [81], it has been shown that the scheme

preserves privacy effectively, while continuing to be useful from a utility point

of view.

Finally, an important class of query inference control methods changes the

results of queries in order to preserve privacy. A classical method for aggregate

queries such as the sum or relative frequency is that of random sampling [32].

In this technique, a random sample of the data is used to compute such

aggregate functions. The random sampling approach makes it impossible for

the questioner to precisely control the formation of query sets. The advantage

of using a random sample is that the results of large queries are quite robust

(in terms of relative error ), but the privacy of individual records are preserved

because of high absolute error .

Another method for query inference control is by adding noise to the results

of queries. Clearly, the noise should be sucient that an adversary cannot use

small changes in the query arguments in order to infer facts about the base

data. In [41], an interesting technique has been presented in which the result

of a query is perturbed by an amount which depends upon the underlying

sensitivity of the query function. This sensitivity of the query function is

defined approximately by the change in the response to the query by changing

one argument to the function. An important theoretical result [19, 33, 39, 40]

shows that a surprisingly small amount of noise needs to be added to the result

of a query, provided that the number of queries is sublinear in the number of

database rows. With increasing sizes of databases today, this result provides

fairly strong guarantees on privacy. Such queries together with their slightly

noisy responses are referred to as the SuLQ primitive.

6 Limitations of Privacy: The Curse of Dimensionality

Many privacy-preserving data-mining methods are inherently limited by the

curse of dimensionality in the presence of public information. For example, the

Search WWH ::

Custom Search

Home