Databases Reference
In-Depth Information
(
t
[1]
,t
[2]
, ..., t
[
d
]
,j,v,c
j
(
v
))
(8)
QI
j
), v an A
s
value, and
c
j
(
v
)
the number of tuples in QI
j
with A
s
value v. Then, from an adversary's
perspective,
where j is the ID of the QI-group including t (i.e., t
∈
Pr
{
t
[
d
+1]=
v
}
=
c
j
(
v
)
/
|
QI
j
|
(9)
where
|
QI
j
|
denotes the size of QI
j
.
Corollary 1 ([18]).
Given a pair of QIT and ST, an adversary can correctly
re-construct any tuple t
∈
T with a probability at most
1
/l.
Corollary 1 gives the privacy protection guarantee at the
tuple level
.It
is also necessary to discuss the corresponding guarantee at the
individual
level
, since in practice multiple individuals may have the same QI-values, thus
complicating the privacy-attack process performed by an adversary.
To explain this, consider that an adversary has the age 65 and zipcode
25000 of Alice (the “owner” of tuple 7 in Table 3a), and wants to infer the
medical record of Alice from the QIT and ST in Tables 4a and 4b, respec-
tively. S/he consults the QIT, and sees that, in QI-group 2 (denoted as
QI
2
),
both tuples 6 and 7 match the QI-values of Alice. Hence, s/he examines two
scenarios.
First, assuming that tuple 6 belongs to Alice, the adversary uses Lemma 1
to derive the probability distribution for the tuple's disease value. According
to Equation 9, tuple 6 has probability
c
2
(flu)
/
=2
/
4 = 50% to carry
flu. Notice that, in the microdata, tuple 6 does not really belong to Alice.
However, it does not matter —
the adversary may “happen to” use a wrong
tuple to infer the correct sensitive value of Alice!
From tuple 6, the adversary
actually has 50% probability to figure out that Alice contracted flu.
In the second scenario, the adversary assumes that tuple 7 belongs to Alice,
through which (similar to tuple 6) s/he also has 50% probability to obtain
the real disease of Alice. Finally, (without further knowledge) the adversary
assumes that the two scenarios occur with the same likelihood
|
QI
2
|
1
2
. Therefore,
1
50% +
2
·
the overall breach probability should be calculated as
2
·
50%, where
1
2
and 50% have the same semantics as in the above discussion.
In fact, Lemma 1 shows that tuple 7 (the real tuple of Alice) can be
re-constructed with 50% likelihood. Namely, the breach probability at the
individual level coincides with that at the tuple level. This happens because
tuples 6 and 7 appear in the same QI-group. In general, as long as tuples
with identical QI-values always end up in the same QI-group (as is true for
“global-recoding” generalization [8]), the probabilities of the two levels are
always equivalent. In this case, it suces to discuss only the (simpler) tuple
level; as a result, the individual level has not been addressed before (all the
existing generalization schemes adopt global recoding).
Anatomy, however, allows high flexibility in forming QI-groups such that
tuples with the same QI-values do not always belong to the same QI-group.