Privacy Preserving Publication: Anonymization Frameworks and Principles - Database Security: Applications and Trends

Databases Reference

In-Depth Information

set of individuals as in the microdata. For instance, Mike in the list of Table 1b

is absent from Table 1a. Nevertheless, by performing an equi-join between the

two tables, an adversary easily recovers the identities of all the patients in the

microdata.

Prevention of linking attacks is important, because they strongly dis-

courage a publisher (e.g., a hospital, a census bureau, etc.) from sharing

its data with researchers, who rely on such data to verify their hypotheses

from laboratories. Several methods have been developed in the database com-

munity to counter such attacks, by computing an adequately anonymized

version T of the microdata T . Each method is essentially the integration

of an anonymization framework and an anonymization principle . Specifically,

a framework describes how anonymization is performed, whereas a principle

measures whether a sucient amount of anonymization has been applied.

In the sequel, we will discuss the characteristics of two existing frameworks:

generalization and anatomy, and those of two most popular principles: k -

anonymity and l -diversity.

The rest of this chapter is organized as follows. Section 2 introduces the

concept of k -anonymous generalization, and points out its vulnerabilities to

linking attacks. Section 3 clarifies l -diversity and how it remedies the defects

of k -anonymity, again assuming generalization is the underlying anonymiza-

tion framework. Section 4 explains how l -diversity can be implemented with

anatomy, and compares the two anonymization frameworks. Section 5 identi-

fies several limitations of l -diversity. Finally, Section 6 provides a summary of

the chapter.

2 k -anonymous Generalization

Given a microdata table T ,the generalization [13, 15] anonymization frame-

work replaces each QI-value with a less specific form, such that the QI-values

of a tuple become indistinguishable from those of some other tuples. Table 2

demonstrates a generalized version of the microdata in Table 1a. For example,

the age 5 of Tuple 1 in Table 1a has been generalized to an interval [1 , 10] in

Table 2. Semantically, the interval indicates that the original age of Tuple 1

may be any value in the range of [1 , 10].

Notice that Tuples 1 and 2 have exactly the same generalized value on

every QI attribute, and therefore, constitute a “QI-group”. Formally, a QI-

group is a group resulting from grouping the tuples in a relation by all the QI

attributes. Clearly, Table 2 involves 4 QI-groups:

{

1, 2

}

(indicated by tuple

IDs),

. It is worth mentioning that the notion of

“QI-group” is also known by several other names, such as “equivalent class”

[9], “q-block” [11], and so on.

Assume that the publisher releases Table 2. Consider the linking attack

launched by the neighbor of Sarah who, as mentioned in Section 1, possesses

the QI values

{

3, 4

}

,

{

5, 6

}

,and

{

7, 8, 9, 10

}

{

}

28, F, 37000

of Sarah. To guess which tuples may belong to

Database Security: Applications and Trends

Search WWH ::

Custom Search

Home