Databases Reference
In-Depth Information
3 The k -Anonymity Framework
The randomization method is a simple technique which can be easily imple-
mented at data collection time , because the noise added to a given record is
independent of the behavior of other data records. This is also a weakness be-
cause outlier records can often be dicult to mask. Clearly, in cases in which
the privacy-preservation does not need to be performed at data-collection
time, it is desirable to have a technique in which the level of inaccuracy de-
pends upon the behavior of the locality of that given record. Another key
weakness of the randomization framework is that it does not consider the
possibility that publicly available records can be used to identify the identity
of the owners of that record. In [10], it has been shown that the use of pub-
licly available records can lead to the privacy getting heavily compromised in
high-dimensional cases. This is especially true of outlier records which can be
easily distinguished from other records in their locality.
In many applications, the data records are made available by simply remov-
ing key identifiers such as the name and social-security numbers from personal
records. However, other kinds of attributes (known as pseudo-identifiers) can
be used in order to accurately identify the records. Foe example, attributes
such as age, zip-code and sex are available in public records such as census
rolls. When these attributes are also available in a given data set, they can be
used to infer the identity of the corresponding individual. A combination of
these attributes can be very powerful, since they can be used to narrow down
the possibilities to a small number of individuals.
In k -anonymity techniques [98], we reduce the granularity of representation
of these pseudo-identifiers with the use of techniques such as generalization
and suppression . In the method of generalization , the attribute values are
generalized to a range in order to reduce the granularity of representation.
For example, the date of birth could be generalized to a range such as year of
birth, so as to reduce the risk of identification. In the method of suppression ,
the value of the attribute is removed completely. It is clear that such methods
reduce the risk of identification with the use of public records, while reducing
the accuracy of applications on the transformed data.
In order to reduce the risk of identification, the k -anonymity approach
requires that every tuple in the table be indistinguishability related to no
fewer than k respondents. This can be formalized as follows:
Definition 1. Each release of the data must be such that every combination
of values of quasi-identifiers can be indistinguishably matched to at least k
respondents.
The first algorithm for k -anonymity was proposed in [98]. The approach uses
domain generalization hierarchies of the quasi-identifiers in order to build k -
anonymous tables. The concept of k -minimal generalization has been proposed
in [98] in order to limit the level of generalization for maintaining as much data
precision as possible for a given level of anonymity. Subsequently, the topic of
Search WWH ::




Custom Search