Privacy in Database Publishing: A Bayesian Perspective - Database Security: Applications and Trends

Databases Reference

In-Depth Information

mult( r [ S ] , [ r ] g )

mult( r [ S ] ,R )

BFBR R

{u},S r (

A g ,

−

) .

(5)

[ r ] g |

∈

Note that the frequency of a sensitive value s in the entire table R can diverge

widely from the frequency of s in the equivalence class of some r

R .Ina

worst-case scenario when s is predominant in R (its frequency in R is close

to 1) but very infrequent in r 's equivalence class, the belief revision for secret

S r is considerably close to 1, which is the maximum possible.

∈

4.1 K-Anonymity

In this section, we expose the connection between the original work on k-

anonymity and the attacker's Bayesian belief revision. Casting the terminology

of [23, 24] in terms of the GBP model, we find that [23, 24] bounds the

attacker's belief revision by requiring the generalization function g to induce

only equivalence classes of cardinality at least k .Inthatcase, g is called

k-anonymous , which we shall denote anon k ( g ):

anon k ( g ):=

[ r ] g |≥

∀

( r

∈

R )

For instance, function g in Example 9 is 2-anonymous.

By the above discussion, k-anonymity immediately implies that for a given

occurrence of sensitive attribute value s in some tuple t of the anonymized

data, there are at least k distinct identities which could be associated with s

in the actual database R . Under assumptions A1 , A2 ,and A3 , the attacker's

odds of guessing that indeed r [ ID ] is the correct identity are at most 1 /k .

Previous work has interpreted this fact as implying that the probability

of correctly guessing that identity id is associated in R to sensitive data value

s is at most 1 /k . As pointed out in [16] and detailed below, this conclusion

is unjustified: it is caused by the confusion between the value of the sensitive

attributes and their occurrence . Specifically, if sensitive value s occurs l times

in r 's equivalence class, then the probability that r [ ID ] is associated with

value s is the sum over all occurrences of s of the probability that r [ ID ]isas-

sociated with that occurrence, yielding

| [ r ] g |

. This quantity can be arbitrarily

larger than

k , reaching 1 in the extreme case when all tuples in r 's equiva-

lence class have the same sensitive value. This observation gives an alternative

explanation why k-anonymity provides no meaningful privacy guarantees in

general.

Before discussing in the following sections refinements of k-anonymity

which address this problem, we first articulate an implicit assumption un-

der which k-anonymity does bound by

the probability of guessing secret

S r .

A4 For every r

R , sensitive value r [ S ] occurs only once in [ r ] g .

∈

Database Security: Applications and Trends

Search WWH ::

Custom Search

Home