Databases Reference
In-Depth Information
set of individuals as in the microdata. For instance, Mike in the list of Table 1b
is absent from Table 1a. Nevertheless, by performing an equi-join between the
two tables, an adversary easily recovers the identities of all the patients in the
microdata.
Prevention of linking attacks is important, because they strongly dis-
courage a publisher (e.g., a hospital, a census bureau, etc.) from sharing
its data with researchers, who rely on such data to verify their hypotheses
from laboratories. Several methods have been developed in the database com-
munity to counter such attacks, by computing an adequately anonymized
version T of the microdata T . Each method is essentially the integration
of an anonymization framework and an anonymization principle . Specifically,
a framework describes how anonymization is performed, whereas a principle
measures whether a sucient amount of anonymization has been applied.
In the sequel, we will discuss the characteristics of two existing frameworks:
generalization and anatomy, and those of two most popular principles: k -
anonymity and l -diversity.
The rest of this chapter is organized as follows. Section 2 introduces the
concept of k -anonymous generalization, and points out its vulnerabilities to
linking attacks. Section 3 clarifies l -diversity and how it remedies the defects
of k -anonymity, again assuming generalization is the underlying anonymiza-
tion framework. Section 4 explains how l -diversity can be implemented with
anatomy, and compares the two anonymization frameworks. Section 5 identi-
fies several limitations of l -diversity. Finally, Section 6 provides a summary of
the chapter.
2 k -anonymous Generalization
Given a microdata table T ,the generalization [13, 15] anonymization frame-
work replaces each QI-value with a less specific form, such that the QI-values
of a tuple become indistinguishable from those of some other tuples. Table 2
demonstrates a generalized version of the microdata in Table 1a. For example,
the age 5 of Tuple 1 in Table 1a has been generalized to an interval [1 , 10] in
Table 2. Semantically, the interval indicates that the original age of Tuple 1
may be any value in the range of [1 , 10].
Notice that Tuples 1 and 2 have exactly the same generalized value on
every QI attribute, and therefore, constitute a “QI-group”. Formally, a QI-
group is a group resulting from grouping the tuples in a relation by all the QI
attributes. Clearly, Table 2 involves 4 QI-groups:
{
1, 2
}
(indicated by tuple
IDs),
. It is worth mentioning that the notion of
“QI-group” is also known by several other names, such as “equivalent class”
[9], “q-block” [11], and so on.
Assume that the publisher releases Table 2. Consider the linking attack
launched by the neighbor of Sarah who, as mentioned in Section 1, possesses
the QI values
{
3, 4
}
,
{
5, 6
}
,and
{
7, 8, 9, 10
}
{
}
28, F, 37000
of Sarah. To guess which tuples may belong to
Search WWH ::




Custom Search