Managing and Querying Encrypted Data - Database Security: Applications and Trends

Databases Reference

In-Depth Information

Such a tradeoff has been studied in [30] where authors develop a strategy to

minimize the disclosure with constraint on the performance degradation 6 .

Let us take the case where bucket based generalization is performed over

a single dimensional ordered data set, e.g., a numeric attribute and the query

class is that of 1-dimensional range queries. The authors in [30] propose en-

tropy and variance of the value distributions in the bucket as appropriate

measure of (the inverse) disclosure risk. Entropy captures the notion of uncer-

tainty associated with a random element chosen with a probability that follows

a certain distribution. The higher the value of entropy of a distribution (i.e.,

larger the number of distinct values and more uniform the frequencies, larger

is the value of the entropy), greater is the uncertainty regarding the true value

of the element. For example, given a domain having 5 distinct values and the

data set having 20 data points, the entropy is maximized if all 5 values appear

equal number of time, i.e. each value has a frequency of 4.

In a secure index-based scheme, the adversary only sees the bucket label B

of a data element t . Therefore, if the adversary (somehow) learns the distribu-

tion (frequencies) of values within B , he can guess the true value (say v ∗ )of

t with a probability equal to the fractional proportion of elements with value

v ∗ within the bucket. The notion of uncertainty regarding the true value can

be captured in an aggregate manner by the entropy of the value distribution

within B . Entropy of a discrete random variable X taking values x i =1 ,...,n

with corresponding probabilities p i ,i =1 ,...,n is given by:

n

Entropy ( X )= H ( X )=

−

p i log 2 ( p i )

i =1

If the domain of the attribute has an order defined on it as in the case

of a numeric attribute, the above definition of entropy does not capture the

notion of distance between two values. In the worst case model, since the value

distribution is assumed to be known to the adversary, greater the spread of

each bucket distribution, better is the protection against disclosure. Therefore,

the authors propose variance of the bucket distribution as the second (inverse)

measure of disclosure risk associated with each bucket. That is, higher the

variance of the value distribution, lower is the disclosure risk.

n

E ( X )) 2 , where E ( X )= 1

n

Variance ( X )=

p i ( x i −

p i x i

i =1

After specifying these measures of disclosure-risk, [30] propose a 2-phase

algorithm for creating the secure indices. The goal is to provide the data owner

6 Notice the dual of the problem maximize performance with a constraint on

information disclosure would also be addressed once we agree on the metric for

information disclosure. However, such an articulation of the problem has not been

studied in the literature.

Database Security: Applications and Trends

Search WWH ::

Custom Search

Home