Databases Reference
In-Depth Information
and the remaining 100 tuples have distinct values different from HIV . Clearly,
the QI-group satisfies distinctness 101-diversity. Nevertheless, the privacy of
HIV patients is poorly preserved. Specifically, consider an adversary who aims
at inferring the disease of such a patient o , and has no background knowledge,
i.e., s/he cannot exclude any disease before studying the published table. With
a random guess, the adversary concludes that o had HIV with probability
900 / 1000 = 90%. Notice that this process of privacy inference essentially
captures homogeneity attacks as a special case; hence, we refer to the process
as a probabilistic homogeneity attack .
This phenomenon leads to an improved version of l -diversity:
Definition 2. A table fulfills frequency l -diversity if, in each QI-group, at
most 1 /l of the tuples carry the sensitive value.
By this reasoning, the last QI-group of Table 2 satisfies frequency 2-diversity,
as the most frequent Disease -value flu is possessed by half of the tuples in
the group. This definition has an important property: if, before consulting the
published table, an adversary cannot preclude any sensitive value as belonging
to the victim individual o , with a probabilistic homogeneity attack, s/he can
correctly reconstruct the real disease of o with at most 1 /l probability.
Frequency l -diversity does not provide adequate protection to background
attacks. To understand this, consider a QI-group with 1000 tuples, 500 of
which have the sensitive value HIV , 499 tuples have pneumonia , and the
remaining tuple carries flu . This QI-group qualifies frequency 2-diversity, since
the most frequent value HIV belongs to 50% of the tuples. Let o be an HIV -
patient. Now, imagine an adversary who knows that this group contains the
record of o , and that o does not have pneumonia . As a result, the record
of o must be one of the 500 HIV -tuples, or the flu -tuple. At this point, the
adversary cannot exclude any other disease; hence, taking a random guess,
s/he conjectures that o contracted HIV with an exceedingly high probability
500 / 501 > 99 . 8%.
The cause of the above problem is as follows: after removing the 2nd fre-
quent sensitive value (i.e., pneumonia ) in a QI-group, the most frequent sensi-
tive ( HIV ) value accounts for an excessively high proportion of the remaining
tuples in the group. The implication is that, it is not enough to limit the fre-
quency of the most popular sensitive value with respect to the QI-group size
(as is the case in frequency l -diversity). Instead, we should limit the frequency
according to the number of remaining tuples, after eliminating those having
the 2nd frequent sensitive value. Remember that, we arrived at this conclusion
by assuming that an adversary can preclude a single sensitive value as owned
by the victim. Carrying the reasoning to the general scenario, if an adversary
can exclude at most l
2 values, we ought to constrain the frequency of the
most sensitive value, with respect to the remaining tuples, after discarding
the 2nd, 3rd, ..., ( l
1)-th most frequent sensitive values. This leads to the
next version of l -diversity.
Search WWH ::




Custom Search