Privacy Preserving Publication: Anonymization Frameworks and Principles - Database Security: Applications and Trends

Databases Reference

In-Depth Information

which minimizes the amount of information loss gauged by an appropriate

metric. Although different metrics exist, all of them capture the loss amount

as a monotone function of the lengths of the generalized intervals (i.e., the

longer the interval, the higher the loss). We do not discuss the details of those

algorithms, most of which can be found in a nice survey [4]. For the subsequent

discussion, however, we need to make two notes.

•

NP-Hardness. Finding the optimal k -anonymity generalization (with

the smallest information loss) is NP-hard [2, 5, 12], even for simple

information-loss metrics. This fact forces a practitioner to accept an

approximate algorithm. However, the existing solutions [2, 5, 12] can-

not ensure a small approximation ratio (which varies according to the

information-loss norm). In particular, the best known ratio is O ( k ), which

implies that the output of an algorithm may considerably deviate from the

optimal quality, when strong privacy protection is required (e.g., k = 20).

•

Curse of Dimensionality. Second, it is known [1] that, when the number

of QI attributes is large, k -anonymous generalization inevitably lose a huge

amount of information, even for k =2.

Machanavajjhala et al. [11] observe two crucial defects of k -anonymity in

guarding against linking attacks. They actually define two types of attacks

that leverage these defects to breach the privacy of individuals. Next, we

will illustrate them using the 2-anonymous generalization in Table 2 of the

microdata Table 1a.

•

Homogeneity Attack. Assume that an adversary knows the QI-values

{

of Sam. After inspecting the published Table 2, s/he re-

alizes that the tuple of Sam must fall in the third QI-group consisting of

Tuples 5 and 6. Since both tuples carry the sensitive value pneumonia ,the

adversary becomes armative that Sam must have contracted that dis-

ease. In other words, the 2-anonymity in Table 2 provides no protection to

the privacy of Sam at all. Note that this observation does not contradict

our earlier conclusion that, k -anonymity ensures that an adversary has

only 1 /k probability to correctly identify the real tuple of a victim. In our

case here, the adversary is equally uncertain about whether Tuple 5 or 6

belongs to Sam. However, it does not matter , since, either way, Sam must

have got the same disease.

19, M, 24000

}

•

Background Attack. Now let us consider, once again, the neighbor of

Sarah. As explained before, from Table 2, the neighbor can only figure out

that Sarah may have had flu , Alzeimer's ,or pneumonia . This conjecture,

however, may be further improved, if the neighbor utilizes her/his “back-

ground knowledge”. For example, s/he may know that a flu-vaccine shot

had been offered to all the residents in the neighborhood a month before

Sarah was hospitalized. Hence, it is rather unlikely that the hospitalization

was caused by flu . Furthermore, obviously, Sarah is too young to get in-

Database Security: Applications and Trends

Search WWH ::

Custom Search

Home