Databases Reference
In-Depth Information
be used for approximate query answering with the use of standard histogram
query processing methods. In [51], a method has been proposed for privacy-
preserving indexing of multi-dimensional data by using bucketizing of the
underlying attribute values in conjunction with encryption of identification
keys. We note that a choice of larger bucket sizes provides greater privacy but
less accuracy. Similarly, optimizing the bucket sizes for accuracy can lead to
reductions in privacy. This tradeoff has been studied in [51], and it has been
shown that reasonable query precision can be maintained at the expense of
partial disclosure.
In the class of methods which use summarization structures for inference
control, an interesting method was proposed by Mishra and Sandler in [81],
which uses pseudo-random sketches for privacy-preservation. In this technique
sketches are constructed from the data, and the sketch representations are
used to respond to user queries. In [81], it has been shown that the scheme
preserves privacy effectively, while continuing to be useful from a utility point
of view.
Finally, an important class of query inference control methods changes the
results of queries in order to preserve privacy. A classical method for aggregate
queries such as the sum or relative frequency is that of random sampling [32].
In this technique, a random sample of the data is used to compute such
aggregate functions. The random sampling approach makes it impossible for
the questioner to precisely control the formation of query sets. The advantage
of using a random sample is that the results of large queries are quite robust
(in terms of relative error ), but the privacy of individual records are preserved
because of high absolute error .
Another method for query inference control is by adding noise to the results
of queries. Clearly, the noise should be sucient that an adversary cannot use
small changes in the query arguments in order to infer facts about the base
data. In [41], an interesting technique has been presented in which the result
of a query is perturbed by an amount which depends upon the underlying
sensitivity of the query function. This sensitivity of the query function is
defined approximately by the change in the response to the query by changing
one argument to the function. An important theoretical result [19, 33, 39, 40]
shows that a surprisingly small amount of noise needs to be added to the result
of a query, provided that the number of queries is sublinear in the number of
database rows. With increasing sizes of databases today, this result provides
fairly strong guarantees on privacy. Such queries together with their slightly
noisy responses are referred to as the SuLQ primitive.
6 Limitations of Privacy: The Curse of Dimensionality
Many privacy-preserving data-mining methods are inherently limited by the
curse of dimensionality in the presence of public information. For example, the
Search WWH ::




Custom Search