Database Reference
In-Depth Information
Fig. 11.1 Example of a k- anonymized data table T ', k= 3. Attributes Zipcode and Nationali-
ty have been generalized to ensure 3- anonymity. From (Fung, Wang et al. 2010).
the data in T that will satisfy k- anonymity for a given k . It has been shown
(Bonizzoni, Vedova et al. 2009) that such task is NP- complete, and therefore the
existing, practical k -anonymization methods (Sweeney 1998) (El Emam, Dankar
et al. 2009) are not necessarily optimal in the above sense.
It needs to be observed that k- anonymity does not fully resolve data privacy
problems. With additional domain knowledge, which the attacker will often pos-
sess, successful attacks, albeit of different type, are still possible. For instance, if
all the records in an equivalence class in a k -anonymized T' have the same value
of a sensitive attribute (e.g. the medical diagnosis), then mapping an instance i to
that equivalence class will also inevitably give away the value of this attribute for
i. This would then become a successful attribute disclosure attack. In order to
avoid this kind of privacy attack, k- anonynymity is often extended to require
l -diversity: every equivalence class in T' must have at least l values of the sensi-
tive attributes. l- diversity, however, is also prone to attacks: consider a two-class
problem assigning a sensitive medical diagnosis to people. Being put in the posi-
tive class may be stigmatizing an individual and may lead to discrimination. But if
the cluster contains only negative individuals, there is no need for diversity: no-
body will mind being in this cluster as no negative inference can be associated
with this membership. On the other hand, knowing that one is in a cluster with 49
positive and one negative individual makes is highly likely (98%) that one has the
condition, while knowing that one is in a cluster with 49 negative and 1 positive
individual is completely different. Both clusters, however, have the same 2-
diversity. (Li and Li 2007) have therefore proposed yet another privacy model,
known at t- closeness, attempting to fix these shortcomings of l- diversity. A cluster
Search WWH ::




Custom Search