Databases Reference
In-Depth Information
Name
Age Sex Zipcode
Ada
61
F
54000
Alice
65
F
25000
Bella
65
F
25000
Emily
67
F
33000
Stephanie
70
F
30000
...
...
...
...
Table 5. The voter registration list (publicly accessible)
To compute Pr A 2 (Alice qi ), an adversary typically needs to consult another
external database [19], which relates QI-values to concrete personal identities
for all the persons in the microdata, perhaps together with some other people.
An example of such an external source is a voter registration list, partially
demonstrated in Table 5, where the record of Emily is italicized to indicate
that she is not involved in the microdata of Table 3a. In this scenario, gen-
eralization and anatomy make a difference. Specifically, judging from (the
QI-values of tuples 5-8 in) the generalized Table 3a, the adversary sees that
each person shown in Table 5 could be involved in the microdata with equal
likelihood, and hence, calculates Pr A 2 (Alice qi ) as 4/5. On the other hand,
given the anatomized Table 4, the adversary concludes that Pr A 2 (Alice qi )=
1 (here s/he can figure out that Emily is definitely absent from the microdata).
As a result, generalization provides a stronger overall privacy-preserving guar-
antee. Nevertheless, since anatomy ensures Pr breach (Alice s
|
A 2)
1 /l , it also
secures the same upper bound 1 /l for Formula 10.
Although generalization has the above advantage over anatomy, the ad-
vantage cannot be leveraged in computing the published data . This is because
the publisher cannot predict or control the external database to be utilized by
an adversary, and therefore, must guard against an “accurate” external source
that does not involve any person absent in the microdata. For instance, if Ta-
ble 5 did not contain Emily, the voter list would produce Pr A 2 (Alice qi )=1
in attacking the privacy of Alice from Table 3a (instead of 4/5 as discussed
earlier). In other words, to ensure a maximum breach probability p using
generalization, we must still set l to
, i.e., same as in applying anatomy.
Finally, if neither assumption A1 nor A2 is satisfied, the breach probability
of Alice becomes
1 /p
Pr breach (Alice s
Pr A 1 ( x )
·
Pr A 2 ( x
|
A 1)
·
|
A 1 ,A 2)
(11)
∀x
where x is a vector representing a possible set of QI-values of Alice, and
Pr A 1 ( x ) equals the probability that x captures Alice's real QI-values, whereas
Pr A 2 and Pr breach follow the same semantics as in Formula 10, but on condi-
tion that x is real. The comparison results between anatomy and generaliza-
Search WWH ::




Custom Search