Databases Reference
In-Depth Information
mult( r [ S ] , [ r ] g )
|
mult( r [ S ] ,R )
|
BFBR R
{u},S r (
V
,
A g ,
|
|
) .
(5)
[ r ] g |
R
|
r
R
Note that the frequency of a sensitive value s in the entire table R can diverge
widely from the frequency of s in the equivalence class of some r
R .Ina
worst-case scenario when s is predominant in R (its frequency in R is close
to 1) but very infrequent in r 's equivalence class, the belief revision for secret
S r is considerably close to 1, which is the maximum possible.
4.1 K-Anonymity
In this section, we expose the connection between the original work on k-
anonymity and the attacker's Bayesian belief revision. Casting the terminology
of [23, 24] in terms of the GBP model, we find that [23, 24] bounds the
attacker's belief revision by requiring the generalization function g to induce
only equivalence classes of cardinality at least k .Inthatcase, g is called
k-anonymous , which we shall denote anon k ( g ):
anon k ( g ):=
[ r ] g |≥
( r
R )
|
k.
For instance, function g in Example 9 is 2-anonymous.
By the above discussion, k-anonymity immediately implies that for a given
occurrence of sensitive attribute value s in some tuple t of the anonymized
data, there are at least k distinct identities which could be associated with s
in the actual database R . Under assumptions A1 , A2 ,and A3 , the attacker's
odds of guessing that indeed r [ ID ] is the correct identity are at most 1 /k .
Previous work has interpreted this fact as implying that the probability
of correctly guessing that identity id is associated in R to sensitive data value
s is at most 1 /k . As pointed out in [16] and detailed below, this conclusion
is unjustified: it is caused by the confusion between the value of the sensitive
attributes and their occurrence . Specifically, if sensitive value s occurs l times
in r 's equivalence class, then the probability that r [ ID ] is associated with
value s is the sum over all occurrences of s of the probability that r [ ID ]isas-
sociated with that occurrence, yielding
l
| [ r ] g |
. This quantity can be arbitrarily
1
larger than
k , reaching 1 in the extreme case when all tuples in r 's equiva-
lence class have the same sensitive value. This observation gives an alternative
explanation why k-anonymity provides no meaningful privacy guarantees in
general.
Before discussing in the following sections refinements of k-anonymity
which address this problem, we first articulate an implicit assumption un-
der which k-anonymity does bound by
1
k
the probability of guessing secret
S r .
A4 For every r
R , sensitive value r [ S ] occurs only once in [ r ] g .
 
Search WWH ::




Custom Search