Databases Reference
In-Depth Information
h
(
A
)=
−
f
A
(
a
) log
2
f
A
(
a
)
da
(1)
Ω
A
where
Ω
A
is the domain of
A
. It is well-known that
h
(
A
) is a measure of
uncertainty inherent in the value of
A
[99]. It can be easily seen that for a
random variable
U
distributed uniformly between 0 and
a
,
h
(
U
) = log
2
(
a
).
For
a
=1,
h
(
U
)=0.
In [5], it was proposed that 2
h
(
A
)
is a measure of privacy inherent in the
random variable
A
. This value is denoted by
Π
(
A
). Thus, a random variable
U
distributed uniformly between 0 and
a
has privacy
Π
(
U
)=2
log
2
(
a
)
=
a
.
For a general random variable
A
,
Π
(
A
) denote the length of the interval, over
which a uniformly distributed random variable has the same uncertainty as
A
.
Given a random variable
B
,the
conditional
differential entropy of
A
is
defined as follows:
h
(
A
|
B
)=
−
f
A,B
(
a, b
) log
2
f
A|B
=
b
(
a
)
da db
(2)
Ω
A,B
B
)=2
h
(
A|B
)
. This
Thus, the average conditional privacy of
A
given
B
is
Π
(
A
|
motivates the following metric
P
(
A
|
B
) for the conditional privacy loss of
A
,
given
B
:
2
h
(
A|B
)
/
2
h
(
A
)
=1
2
−I
(
A
;
B
)
.
P
(
A
|
B
)=1
−
Π
(
A
|
B
)
/Π
(
A
)=1
−
−
(3)
where
I
(
A
;
B
)=
h
(
A
)
A
).
I
(
A
;
B
) is also known
as the
mutual information
between the random variables
A
and
B
. Clearly,
P
−
h
(
A
|
B
)=
h
(
B
)
−
h
(
B
|
B
) is the fraction of privacy of
A
which is lost by revealing
B
.
As an illustration, let us reconsider Example 1 given above. In this case,
the differential entropy of
X
is given by:
(
A
|
h
(
X
)=
−
f
X
(
x
) log
2
f
X
(
x
)
dx
= 1
(4)
Ω
X
Thus the privacy of
X
,
Π
(
X
)=2
1
= 2. In other words,
X
has as much
privacy as a random variable distributed uniformly in an interval of length
2. The density function of the perturbed value
Z
is given by
f
Z
(
z
)=
∞
−∞
ν
)
dν
.
Using
f
Z
(
z
), we can compute the differential entropy
h
(
Z
)of
Z
.Itturns
out that
h
(
Z
)=9
/
4. Therefore, we have:
f
X
(
ν
)
f
Y
(
z
−
I
(
X
;
Z
)=
h
(
Z
)
−
h
(
Z
|
X
)=9
/
4
−
h
(
Y
)=9
/
4
−
1=5
/
4
(5)
Here, the second equality
h
(
Z
X
)=
h
(
Y
) follows from the fact that
X
and
Y
are independent and
Z
=
X
+
Y
. Thus, the fraction of privacy loss in this
case is
|
2
−
5
/
4
P
(
X
|
Z
)=1
−
=0
.
5796. Therefore, after revealing
Z
,
X
has
privacy
Π
(
X
0
.
5796) = 0
.
8408. This
value is less than 1, since
X
can be localized to an interval of length less than
one for many values of
Z
.
|
Z
)=
Π
(
X
)
×
(1
−P
(
X
|
Z
)) = 2
×
(1
.
0
−