Databases Reference
In-Depth Information
2.1 Privacy Quantification
The quantity used to measure privacy should indicate how closely the original
value of an attribute can be estimated. The work in [2] uses a measure that
defines privacy as follows: If the original value can be estimated with c %con-
fidence to lie in the interval [ α 1 2 ], then the interval width ( α 2
α 1 ) defines
the amount of privacy at c % confidence level. For example, if the perturbing
additive is uniformly distributed in an interval of width 2 α , then α is the
amount of privacy at confidence level 50% and 2 α is the amount of privacy at
confidence level 100%. However, this simple method of determining privacy
can be subtly incomplete in some situations. This can be best explained by
the following example.
Example 1. Consider an attribute X with the density function f X ( x ) given
by:
f X ( x ) = 0.5 0
x
1
0.5 4
x
5
0
otherwise
Assume that the perturbing additive Y is distributed uniformly between
[
1 , 1]. Then according to the measure proposed in [2], the amount of privacy
is 2 at confidence level 100%.
However, after performing the perturbation and subsequent reconstruc-
tion, the density function f X ( x ) will be approximately revealed. Let us assume
for a moment that a large amount of data is available, so that the distribution
function is revealed to a high degree of accuracy. Since the (distribution of
the) perturbing additive is publically known, the two pieces of information
can be combined to determine that if Z
[
1 , 2], then X
[0 , 1]; whereas if
Z
[4 , 5].
Thus, in each case, the value of X can be localized to an interval of length
1. This means that the actual amount of privacy offered by the perturbing
additive Y is at most 1 at confidence level 100%. We use the qualifier 'at
most' since X can often be localized to an interval of length less than one.
For example, if the value of Z happens to be
[3 , 6] then X
0 . 5, then the value of X can
be localized to an even smaller interval of [0 , 0 . 5].
This example illustrates that the method suggested in [2] does not take
into account the distribution of original data. In other words, the (aggregate)
reconstruction of the attribute value also provides a certain level of knowledge
which can be used to guess a data value to a higher level of accuracy. To accu-
rately quantify privacy, we need a method which takes such side-information
into account.
A key privacy measure [5] is based on the differential entropy of a random
variable. The differential entropy h ( A ) of a random variable A is defined as
follows:
Search WWH ::




Custom Search