Databases Reference
In-Depth Information
which minimizes the amount of information loss gauged by an appropriate
metric. Although different metrics exist, all of them capture the loss amount
as a monotone function of the lengths of the generalized intervals (i.e., the
longer the interval, the higher the loss). We do not discuss the details of those
algorithms, most of which can be found in a nice survey [4]. For the subsequent
discussion, however, we need to make two notes.
NP-Hardness. Finding the optimal k -anonymity generalization (with
the smallest information loss) is NP-hard [2, 5, 12], even for simple
information-loss metrics. This fact forces a practitioner to accept an
approximate algorithm. However, the existing solutions [2, 5, 12] can-
not ensure a small approximation ratio (which varies according to the
information-loss norm). In particular, the best known ratio is O ( k ), which
implies that the output of an algorithm may considerably deviate from the
optimal quality, when strong privacy protection is required (e.g., k = 20).
Curse of Dimensionality. Second, it is known [1] that, when the number
of QI attributes is large, k -anonymous generalization inevitably lose a huge
amount of information, even for k =2.
Machanavajjhala et al. [11] observe two crucial defects of k -anonymity in
guarding against linking attacks. They actually define two types of attacks
that leverage these defects to breach the privacy of individuals. Next, we
will illustrate them using the 2-anonymous generalization in Table 2 of the
microdata Table 1a.
Homogeneity Attack. Assume that an adversary knows the QI-values
{
of Sam. After inspecting the published Table 2, s/he re-
alizes that the tuple of Sam must fall in the third QI-group consisting of
Tuples 5 and 6. Since both tuples carry the sensitive value pneumonia ,the
adversary becomes armative that Sam must have contracted that dis-
ease. In other words, the 2-anonymity in Table 2 provides no protection to
the privacy of Sam at all. Note that this observation does not contradict
our earlier conclusion that, k -anonymity ensures that an adversary has
only 1 /k probability to correctly identify the real tuple of a victim. In our
case here, the adversary is equally uncertain about whether Tuple 5 or 6
belongs to Sam. However, it does not matter , since, either way, Sam must
have got the same disease.
19, M, 24000
}
Background Attack. Now let us consider, once again, the neighbor of
Sarah. As explained before, from Table 2, the neighbor can only figure out
that Sarah may have had flu , Alzeimer's ,or pneumonia . This conjecture,
however, may be further improved, if the neighbor utilizes her/his “back-
ground knowledge”. For example, s/he may know that a flu-vaccine shot
had been offered to all the residents in the neighborhood a month before
Sarah was hospitalized. Hence, it is rather unlikely that the hospitalization
was caused by flu . Furthermore, obviously, Sarah is too young to get in-
Search WWH ::




Custom Search