Privacy Preserving Publication: Anonymization Frameworks and Principles - Database Security: Applications and Trends

Databases Reference

In-Depth Information

Name

Age Sex Zipcode

Ada

54000

Alice

25000

Bella

25000

Emily

33000

Stephanie

30000

...

Table 5. The voter registration list (publicly accessible)

To compute Pr A 2 (Alice qi ), an adversary typically needs to consult another

external database [19], which relates QI-values to concrete personal identities

for all the persons in the microdata, perhaps together with some other people.

An example of such an external source is a voter registration list, partially

demonstrated in Table 5, where the record of Emily is italicized to indicate

that she is not involved in the microdata of Table 3a. In this scenario, gen-

eralization and anatomy make a difference. Specifically, judging from (the

QI-values of tuples 5-8 in) the generalized Table 3a, the adversary sees that

each person shown in Table 5 could be involved in the microdata with equal

likelihood, and hence, calculates Pr A 2 (Alice qi ) as 4/5. On the other hand,

given the anatomized Table 4, the adversary concludes that Pr A 2 (Alice qi )=

1 (here s/he can figure out that Emily is definitely absent from the microdata).

As a result, generalization provides a stronger overall privacy-preserving guar-

antee. Nevertheless, since anatomy ensures Pr breach (Alice s

A 2)

≤

1 /l , it also

secures the same upper bound 1 /l for Formula 10.

Although generalization has the above advantage over anatomy, the ad-

vantage cannot be leveraged in computing the published data . This is because

the publisher cannot predict or control the external database to be utilized by

an adversary, and therefore, must guard against an “accurate” external source

that does not involve any person absent in the microdata. For instance, if Ta-

ble 5 did not contain Emily, the voter list would produce Pr A 2 (Alice qi )=1

in attacking the privacy of Alice from Table 3a (instead of 4/5 as discussed

earlier). In other words, to ensure a maximum breach probability p using

generalization, we must still set l to

, i.e., same as in applying anatomy.

Finally, if neither assumption A1 nor A2 is satisfied, the breach probability

of Alice becomes

1 /p

Pr breach (Alice s

Pr A 1 ( x )

Pr A 2 ( x

A 1)

A 1 ,A 2)

(11)

∀x

where x is a vector representing a possible set of QI-values of Alice, and

Pr A 1 ( x ) equals the probability that x captures Alice's real QI-values, whereas

Pr A 2 and Pr breach follow the same semantics as in Formula 10, but on condi-

tion that x is real. The comparison results between anatomy and generaliza-

Database Security: Applications and Trends

Search WWH ::

Custom Search

Home