Databases Reference
In-Depth Information
20
Privacy Preserving Publication:
Anonymization Frameworks and Principles
Yufei Tao
Department of Computer Science and Engineering
Chinese University of Hong Kong
Sha Tin, New Territories, Hong Kong
taoyf@cse.cuhk.edu.hk
Summary.
Given a microdata table
T
, the objective of
privacy preserving pub-
lication
is to release a distorted version
T
of
T
such that
T
does not allow an
adversary to confidently derive the sensitive data of any individual, and yet,
T
can
be used to analyze the statistical patterns significant in
T
. The existing methods
of privacy preserving publication is essentially the integration of an
anonymiza-
tion framework
and an
anonymization principle
. Specifically, a framework describes
how anonymization is performed, whereas a principle measures whether a sucient
amount of anonymization has been applied. In this chapter, we will discuss the char-
acteristics of two existing frameworks: generalization and anatomy, and of two most
popular principles:
k
-anonymity and
l
-diversity.
1 Introduction
This chapter will discuss an important problem, known as
privacy preserving
publication
, in the literature of data privacy protection. Formally, we have a
trustable publisher that has a
microdata
table
T
, where each tuple describes
the information of an individual. For our discussion, assume that
T
has
d
non-
sensitive attributes
A
1
,
A
2
, ...,
A
d
and a sensitive attribute
A
s
. The objective
is to publish an anonymized version
T
of
T
such that
T
does not allow an
adversary to confidently derive the sensitive data of any individual, and yet,
T
can be used to analyze the statistical patterns significant in
T
.
As a concrete application example, consider that the publisher is a hospital,
and
T
is given in Table 1a. Here,
T
has three non-sensitive attributes
A
1
=
Age
,
A
2
=
Sex
,
A
3
=
Zipcode
, and a sensitive attribute
A
s
=
Disease
.The
column
Name
specifies the owners of the tuples, e.g., Tuple 1 indicates that
Andy, aged 5, lives in a neighborhood with Zipcode 12000, and he contracted
gastric-ulcer
. Obviously,
Name
should not be published along with
T
, since it
explicitly reveals the identities of all individuals.
Let
T
be the resulting table after removing
Name
from
T
. At first glance,
it appears that we can simply release
T
, which,
by itself
, does not contain any