Databases Reference
In-Depth Information
2.1 Privacy Quantification
The quantity used to measure privacy should indicate how closely the original
value of an attribute can be estimated. The work in [2] uses a measure that
defines privacy as follows: If the original value can be estimated with
c
%con-
fidence to lie in the interval [
α
1
,α
2
], then the interval width (
α
2
−
α
1
) defines
the amount of privacy at
c
% confidence level. For example, if the perturbing
additive is uniformly distributed in an interval of width 2
α
, then
α
is the
amount of privacy at confidence level 50% and 2
α
is the amount of privacy at
confidence level 100%. However, this simple method of determining privacy
can be subtly incomplete in some situations. This can be best explained by
the following example.
Example 1.
Consider an attribute
X
with the density function
f
X
(
x
) given
by:
f
X
(
x
) = 0.5 0
≤
x
≤
1
0.5 4
≤
x
≤
5
0
otherwise
Assume that the perturbing additive
Y
is distributed uniformly between
[
1
,
1]. Then according to the measure proposed in [2], the amount of privacy
is 2 at confidence level 100%.
However, after performing the perturbation and subsequent reconstruc-
tion, the density function
f
X
(
x
) will be approximately revealed. Let us assume
for a moment that a large amount of data is available, so that the distribution
function is revealed to a high degree of accuracy. Since the (distribution of
the) perturbing additive is publically known, the two pieces of information
can be combined to determine that if
Z
−
∈
[
−
1
,
2], then
X
∈
[0
,
1]; whereas if
Z
[4
,
5].
Thus, in each case, the value of
X
can be localized to an interval of length
1. This means that the actual amount of privacy offered by the perturbing
additive
Y
is
at most
1 at confidence level 100%. We use the qualifier 'at
most' since
X
can often be localized to an interval of length less than one.
For example, if the value of
Z
happens to be
∈
[3
,
6] then
X
∈
0
.
5, then the value of
X
can
be localized to an even smaller interval of [0
,
0
.
5].
−
This example illustrates that the method suggested in [2] does not take
into account the distribution of original data. In other words, the (aggregate)
reconstruction of the attribute value also provides a certain level of knowledge
which can be used to guess a data value to a higher level of accuracy. To accu-
rately quantify privacy, we need a method which takes such side-information
into account.
A key privacy measure [5] is based on the
differential entropy
of a random
variable. The differential entropy
h
(
A
) of a random variable
A
is defined as
follows: