Privacy-Preserving Data Mining: A Survey - Database Security: Applications and Trends

Databases Reference

In-Depth Information

h ( A )=

−

f A ( a ) log 2 f A ( a ) da

(1)

Ω A

where Ω A is the domain of A . It is well-known that h ( A ) is a measure of

uncertainty inherent in the value of A [99]. It can be easily seen that for a

random variable U distributed uniformly between 0 and a , h ( U ) = log 2 ( a ).

For a =1, h ( U )=0.

In [5], it was proposed that 2 h ( A ) is a measure of privacy inherent in the

random variable A . This value is denoted by Π ( A ). Thus, a random variable

U distributed uniformly between 0 and a has privacy Π ( U )=2 log 2 ( a ) = a .

For a general random variable A , Π ( A ) denote the length of the interval, over

which a uniformly distributed random variable has the same uncertainty as

A .

Given a random variable B ,the conditional differential entropy of A is

defined as follows:

h ( A

B )=

−

f A,B ( a, b ) log 2 f A|B = b ( a ) da db

(2)

Ω A,B

B )=2 h ( A|B ) . This

Thus, the average conditional privacy of A given B is Π ( A

motivates the following metric

( A

B ) for the conditional privacy loss of A ,

given B :

2 h ( A|B ) / 2 h ( A ) =1

2 −I ( A ; B ) .

( A

B )=1

−

Π ( A

B ) /Π ( A )=1

−

(3)

where I ( A ; B )= h ( A )

A ). I ( A ; B ) is also known

as the mutual information between the random variables A and B . Clearly,

−

h ( A

B )= h ( B )

−

h ( B

B ) is the fraction of privacy of A which is lost by revealing B .

As an illustration, let us reconsider Example 1 given above. In this case,

the differential entropy of X is given by:

( A

h ( X )=

−

f X ( x ) log 2 f X ( x ) dx = 1

(4)

Ω X

Thus the privacy of X , Π ( X )=2 1 = 2. In other words, X has as much

privacy as a random variable distributed uniformly in an interval of length

2. The density function of the perturbed value Z is given by f Z ( z )=

∞

−∞

ν ) dν .

Using f Z ( z ), we can compute the differential entropy h ( Z )of Z .Itturns

out that h ( Z )=9 / 4. Therefore, we have:

f X ( ν ) f Y ( z

−

I ( X ; Z )= h ( Z )

−

h ( Z

X )=9 / 4

−

h ( Y )=9 / 4

−

1=5 / 4

(5)

Here, the second equality h ( Z

X )= h ( Y ) follows from the fact that X and

Y are independent and Z = X + Y . Thus, the fraction of privacy loss in this

case is

2 − 5 / 4

( X

Z )=1

−

=0 . 5796. Therefore, after revealing Z , X has

privacy Π ( X

0 . 5796) = 0 . 8408. This

value is less than 1, since X can be localized to an interval of length less than

one for many values of Z .

Z )= Π ( X )

−P

( X

Z )) = 2

(1 . 0

−

Database Security: Applications and Trends

Search WWH ::

Custom Search

Home