Privacy-Preserving Data Mining: A Survey - Database Security: Applications and Trends

Databases Reference

In-Depth Information

amount of research has been devoted to increasing the number of queries

which can be answered by the auditor without compromising privacy.

In [63], the problem of sum auditing on sub-cubes of the data cube are

studied, where a query expression is constructed using a string of 0, 1, and

*. The elements to be summed up are determined by using matches to the

query string pattern. In [67], the problem of auditing a database of boolean

values is studied for the case of sum and max queries. In [18], and approach for

query auditing is discussed which is actually a combination of the approach

of denying some queries and modifying queries in order to achieve privacy.

In [64], the authors show that denials to queries depending upon the answer

to the current query can leak information. The authors introduce the notion

of simulatable auditing for auditing sum and max queries. In [84], the authors

devise methods for auditing max queries and bags of max and min queries

under the partial and full disclosure settings. The authors also examine the

notion of utility in the context of auditing, and obtain results for sum queries

in the full disclosure setting.

A number of techniques have also been proposed for the oine version of

the auditing problem. In [26], a number of variations of the oine auditing

problem have been studied. In the oine auditing problem, we are given a

sequence of queries which have been truthfully answered, and we need to deter-

mine if privacy has been breached. In [26], effective algorithms were proposed

for the sum, max, and max and min versions of the problems. On the other

hand, the sum and max version of the problem was shown to be NP-hard.

In [4], an oine auditing framework was proposed for determining whether

a database adheres to its disclosure properties. The key idea is to create an

audit expression which specifies sensitive table entries.

A number of techniques have also been proposed for sanitizing or random-

izing the data for query auditing purposes. These are fairly general models

of privacy, since they preserve the privacy of the data even when the en-

tire database is available. The standard methods for perturbation [2, 5] or

k -anonymity [98] can always be used, and it is always guaranteed that an

adversary may not derive anything more from the queries than they can from

the base data. Thus, since a k -anonymity model guarantees a certain level of

privacy even when the entire database is made available, it will continue to

do so under any sequence of queries. In [23], a number of interesting methods

are discussed for measuring the effectiveness of sanitization schemes in terms

of balancing privacy and utility.

Instead of sanitizing the base data, it is possible to use summary constructs

on the data, and respond to queries using only the information encoded in

the summary constructs. Such an approach preserves privacy, as long as the

summary constructs do not reveal sensitive information about the underly-

ing records. A histogram based approach to data sanitization has been dis-

cussed in [23, 24]. In this technique the data is recursively partitioned into

multi-dimensional cells. The final output is the exact description of the cuts

along with the population of each cell. Clearly, this kind of description can

Database Security: Applications and Trends

Search WWH ::

Custom Search

Home