Privacy-Preserving Data Mining: A Survey - Database Security: Applications and Trends

Databases Reference

In-Depth Information

2.1 Privacy Quantification

The quantity used to measure privacy should indicate how closely the original

value of an attribute can be estimated. The work in [2] uses a measure that

defines privacy as follows: If the original value can be estimated with c %con-

fidence to lie in the interval [ α 1 ,α 2 ], then the interval width ( α 2 −

α 1 ) defines

the amount of privacy at c % confidence level. For example, if the perturbing

additive is uniformly distributed in an interval of width 2 α , then α is the

amount of privacy at confidence level 50% and 2 α is the amount of privacy at

confidence level 100%. However, this simple method of determining privacy

can be subtly incomplete in some situations. This can be best explained by

the following example.

Example 1. Consider an attribute X with the density function f X ( x ) given

by:

f X ( x ) = 0.5 0

≤

0.5 4

≤

otherwise

Assume that the perturbing additive Y is distributed uniformly between

[

1 , 1]. Then according to the measure proposed in [2], the amount of privacy

is 2 at confidence level 100%.

However, after performing the perturbation and subsequent reconstruc-

tion, the density function f X ( x ) will be approximately revealed. Let us assume

for a moment that a large amount of data is available, so that the distribution

function is revealed to a high degree of accuracy. Since the (distribution of

the) perturbing additive is publically known, the two pieces of information

can be combined to determine that if Z

−

∈

[

−

1 , 2], then X

∈

[0 , 1]; whereas if

[4 , 5].

Thus, in each case, the value of X can be localized to an interval of length

1. This means that the actual amount of privacy offered by the perturbing

additive Y is at most 1 at confidence level 100%. We use the qualifier 'at

most' since X can often be localized to an interval of length less than one.

For example, if the value of Z happens to be

∈

[3 , 6] then X

∈

0 . 5, then the value of X can

be localized to an even smaller interval of [0 , 0 . 5].

−

This example illustrates that the method suggested in [2] does not take

into account the distribution of original data. In other words, the (aggregate)

reconstruction of the attribute value also provides a certain level of knowledge

which can be used to guess a data value to a higher level of accuracy. To accu-

rately quantify privacy, we need a method which takes such side-information

into account.

A key privacy measure [5] is based on the differential entropy of a random

variable. The differential entropy h ( A ) of a random variable A is defined as

follows:

Database Security: Applications and Trends

Search WWH ::

Custom Search

Home