Databases Reference
In-Depth Information
mult(
r
[
S
]
,
[
r
]
g
)
|
mult(
r
[
S
]
,R
)
|
BFBR
R
{u},S
r
(
V
,
A
g
,
|
−
|
)
.
(5)
[
r
]
g
|
R
|
r
∈
R
Note that the frequency of a sensitive value
s
in the entire table
R
can diverge
widely from the frequency of
s
in the equivalence class of some
r
R
.Ina
worst-case scenario when
s
is predominant in
R
(its frequency in
R
is close
to 1) but very infrequent in
r
's equivalence class, the belief revision for secret
S
r
is considerably close to 1, which is the maximum possible.
∈
4.1 K-Anonymity
In this section, we expose the connection between the original work on k-
anonymity and the attacker's Bayesian belief revision. Casting the terminology
of [23, 24] in terms of the
GBP
model, we find that [23, 24] bounds the
attacker's belief revision by requiring the generalization function
g
to induce
only equivalence classes of cardinality at least
k
.Inthatcase,
g
is called
k-anonymous
, which we shall denote anon
k
(
g
):
anon
k
(
g
):=
[
r
]
g
|≥
∀
(
r
∈
R
)
|
k.
For instance, function
g
in Example 9 is 2-anonymous.
By the above discussion, k-anonymity immediately implies that for a given
occurrence
of sensitive attribute value
s
in some tuple
t
of the anonymized
data, there are at least
k
distinct identities which could be associated with
s
in the actual database
R
. Under assumptions
A1
,
A2
,and
A3
, the attacker's
odds of guessing that indeed
r
[
ID
] is the correct identity are at most 1
/k
.
Previous work has interpreted this fact as implying that the probability
of correctly guessing that identity
id
is associated in
R
to sensitive data
value
s
is at most 1
/k
. As pointed out in [16] and detailed below, this conclusion
is unjustified: it is caused by the confusion between the
value
of the sensitive
attributes and their
occurrence
. Specifically, if sensitive value
s
occurs
l
times
in
r
's equivalence class, then the probability that
r
[
ID
] is associated with
value
s
is the sum over all occurrences of
s
of the probability that
r
[
ID
]isas-
sociated with that occurrence, yielding
l
|
[
r
]
g
|
. This quantity can be arbitrarily
1
larger than
k
, reaching 1 in the extreme case when all tuples in
r
's equiva-
lence class have the same sensitive value. This observation gives an alternative
explanation why k-anonymity provides no meaningful privacy guarantees in
general.
Before discussing in the following sections refinements of k-anonymity
which address this problem, we first articulate an implicit assumption un-
der which k-anonymity does bound by
1
k
the probability of guessing secret
S
r
.
A4
For every
r
R
, sensitive value
r
[
S
] occurs only once in [
r
]
g
.
∈