Managing and Querying Encrypted Data - Database Security: Applications and Trends

Databases Reference

In-Depth Information

a tunable algorithm that allows him to select a desired degree of tradeoff be-

tween performance and security. In the first phase, the values appearing in the

attribute are divided into an user-specified (say M) number of buckets such

that the average number of false-positives is minimized over all possible range

queries (i.e., queries with range predicates on the specified attribute). The

buckets so created might not meet the required security criteria (i.e., some

minimum level of entropy and variance) and therefore in a second pass, the

values within these optimal buckets are re-distributed in a “controlled man-

ner” into a new set of M buckets so as to increase the value of entropy and

variance of the bucket level distributions while admitting only up to a speci-

fied maximum degree of performance degradation. The tunable (user-chosen)

parameter specifies this maximum allowed degree of quality degradation.

Similar measures of disclosure-risk have been proposed for privacy preserv-

ing data publishing [36]. There too, the key technique for achieving anonymity

is data generalization which is akin to the partitioning approach in [30]. For

more discussion on the choice of the privacy measures and details of the par-

titioning and redistribution algorithms the interested reader can refer to [30].

Discussion

In this section only single dimensional data was considered. Most real data sets

have multiple attributes with various kinds of dependencies and correlations

between the attributes. There may be some kinds of functional dependencies

(exact or partial) and correlations as in multidimensional relational data or

even structural dependencies as in XML data. Therefore, knowledge about

one attribute might disclose the value of another via the knowledge of such

associations. The security-cost analysis for such data becomes significantly

different. Also, in this section, the analysis that was presented, was carried

out for the worst-case scenario where it was assumed that the complete value

distribution of the bucket is known to an adversary. In reality it is unrealistic

to assume that an adversary has exact knowledge of the complete distribution

of a data set. Moreover, to learn the bucket-level joint-distribution of data,

the required size of the training set (in order to approximate the distribu-

tion to a given level of accuracy) grows exponentially with the number of

attributes/dimensions. This makes the assumption of “complete bucket-level”

knowledge of distribution even more unrealistic for multidimensional data. [31]

proposes a new approach to analyze the disclosure risk for multidimensional

data and extends the work in [30] to this case.

3 Trust, Encryption, Key-management, Integrity & Data

Confidentiality

Having discussed the querying aspects of encrypted data , let us look at some

basic security related issues that need to be addressed in a DAS application.

Search WWH ::

Custom Search

Home