Databases Reference
In-Depth Information
fected with Alzeimer's disease. Thus, the adversary becomes (almost fully)
confident that Sarah contracted pneumonia.
The above drawbacks lead to the development of another anonymization
principle as discussed in the next section.
3 l -diverse Generalization
Both homogeneity and background attacks are caused by the fact that there
is not sucient diversity in the set of sensitive values present in a QI-group.
For example, in Figure 2, there is only one Disease -value pneumonia in the
QI-group containing tuples 5-6, which is the reasoning behind the homogene-
ity attack illustrated in the previous section. Although more diversity exists
in the last QI-group involving tuples 7-10 (where there are 3 sensitive val-
ues flu , Alzeimer , pneumonia ), the degree of diversity is still not enough for
preventing the background attack launched by the adversary (the neighbor of
Sarah mentioned in Section 2) that can exclude 2 diseases flu and Alzeimer
from being the real disease of Sarah.
Evidently, in the worst case, no matter how diverse the sensitive values
are in a QI-group, a highly-knowledgable adversary can still precisely derive
the privacy of the victim individual o . Specifically, assume that the QI-group
accommodating the record of o has x different sensitive values, whereas the
adversary can correctly assert that o cannot be associated with x
1 of them;
in this case, the adversary uniquely identifies the true sensitive value of o .
Fortunately, the realistic situation is much more optimistic, since it is rare for
an adversary to be able to exclude too many sensitive values with respect to
o . For instance, among the vast number of possible diseases, the neighbor of
Sarah most likely can exclude only a very small percentage as the real disease
of Sarah.
l -diversity [11] was exactly motivated by this observation. It requires
that, after generalization, every QI-group should contain at least l “well-
represented” sensitive values. Intuitively, this requirement does not allow an
adversary to accurately recover the sensitive value of any individual o ,pro-
vided that the adversary can exclude up to l
2 values (i.e., leaving at least 2
possibilities for o ). Thus, with a suciently large l , l -diversity can effectively
prevent privacy breaches.
There are multiple ways to interpret the meaning of “well-represented”.
The simplest one is
Definition 1. A QI-group fulfills distinctness l -diversity , if it contains at
least l different sensitive values.
Although this interpretation can be easily understood, it does not offer strong
privacy guarantees from a probabilistic point of view. For example, imagine a
QI-group with 1000 tuples, 900 of which carry the same sensitive value HIV ,
Search WWH ::




Custom Search