Database Reference
In-Depth Information
Fig. 11.1
Example of a
k-
anonymized data table
T
',
k=
3. Attributes Zipcode and Nationali-
ty have been generalized to ensure
3-
anonymity. From (Fung, Wang et al. 2010).
the data in
T
that will satisfy
k-
anonymity for a given
k
. It has been shown
(Bonizzoni, Vedova et al. 2009) that such task is
NP-
complete, and therefore the
existing, practical
k
-anonymization methods (Sweeney 1998) (El Emam, Dankar
et al. 2009) are not necessarily optimal in the above sense.
It needs to be observed that
k-
anonymity does not fully resolve data privacy
problems. With additional domain knowledge, which the attacker will often pos-
sess, successful attacks, albeit of different type, are still possible. For instance, if
all the records in an equivalence class in a
k
-anonymized
T'
have the same value
of a sensitive attribute (e.g. the medical diagnosis), then mapping an instance
i
to
that equivalence class will also inevitably give away the value of this attribute for
i.
This would then become a successful attribute disclosure attack. In order to
avoid this kind of privacy attack,
k-
anonynymity is often extended to require
l
-diversity: every equivalence class in
T'
must have at least
l
values of the sensi-
tive attributes.
l-
diversity, however, is also prone to attacks: consider a two-class
problem assigning a sensitive medical diagnosis to people. Being put in the posi-
tive class may be stigmatizing an individual and may lead to discrimination. But if
the cluster contains only negative individuals, there is no need for diversity: no-
body will mind being in this cluster as no negative inference can be associated
with this membership. On the other hand, knowing that one is in a cluster with 49
positive and one negative individual makes is highly likely (98%) that one has the
condition, while knowing that one is in a cluster with 49 negative and 1 positive
individual is completely different. Both clusters, however, have the same
2-
diversity. (Li and Li 2007) have therefore proposed yet another privacy model,
known at
t-
closeness, attempting to fix these shortcomings of
l-
diversity. A cluster
Search WWH ::
Custom Search