Database Reference
In-Depth Information
Quasi-Identifier
Sensitive
attributes
attribute
Gender
Date of Birth
ZIP Code
Disease
F
1988
561*
Flu
F
1988
561*
Flu
F
1988
561*
Flu
M
1990
910*
Heart Disease
M
1990
910*
Cold
M
1990 910* Flu
Figure 9.1 A 3-anonymous database.
different models have been proposed by the scientific community to achieve
privacy protection while sharing and analyzing personal sensitive information.
The most important privacy models are: k -anonymity, l -diversity, t -closeness,
randomization, and cryptography-based models.
k -anonymity
The k-anonymity model was introduced in the context of relational databases,
where data are stored in a table and each row of this table corresponds to one
individual. The basic idea of the k -anonymity model is to guarantee that the
information of every data subject cannot be distinguished from the information
of other k
1 data subjects. This model is based on the assumption of the
existence of the following kind of attributes in the user's record: identifiers ,
which explicitly identify data owners, such as name and social security number
(SSN); quasi-identifiers , which could identify data owners or a small groups of
them (e.g., gender and zip code); sensitive attributes, which represent sensitive
person-specific information (e.g., disease and salary) to be protected. Based on
this classification, the privacy requirement defined by k -anonymity is that for
each released record (e.g., a record is a row in the table in Figure 9.1 ) there must
be at least other k 1 records with the same quasi-identifier values. A set of
records that have the same values for the quasi-identifiers is called equivalence
class . The techniques adopted in the literature to enforce k -anonymity involve
the removal of explicit identifiers and the generalization (e.g., date of birth is
changed to the year of birth) or suppression (e.g., removing the date of birth),
or microaggregation (clustering and averaging) of quasi-identifiers. It is evident
that these techniques reduce the accuracy of the disclosed information.
l -diversity
The weakness of the k -anonymity model is that it can allow the disclosure of
sensitive information. In other words, it only protects the identity of a user.
Indeed, if a group of k records all have the same quasi-identifiers values and
the same value of the sensitive attribute, it is not able to protect the sensitive
information. As an example, consider the table in Figure 9.1 . Suppose that the
Search WWH ::




Custom Search