Mobility Data and Privacy - Mobility Data

Database Reference

In-Depth Information

Quasi-Identifier

Sensitive

attributes

attribute

Gender

Date of Birth

ZIP Code

Disease

F

1988

561*

Flu

F

1988

561*

Flu

F

1988

561*

Flu

M

1990

910*

Heart Disease

M

1990

910*

Cold

M

1990 910* Flu

Figure 9.1 A 3-anonymous database.

different models have been proposed by the scientific community to achieve

privacy protection while sharing and analyzing personal sensitive information.

The most important privacy models are: k -anonymity, l -diversity, t -closeness,

randomization, and cryptography-based models.

k -anonymity

The k-anonymity model was introduced in the context of relational databases,

where data are stored in a table and each row of this table corresponds to one

individual. The basic idea of the k -anonymity model is to guarantee that the

information of every data subject cannot be distinguished from the information

of other k

1 data subjects. This model is based on the assumption of the

existence of the following kind of attributes in the user's record: identifiers ,

which explicitly identify data owners, such as name and social security number

(SSN); quasi-identifiers , which could identify data owners or a small groups of

them (e.g., gender and zip code); sensitive attributes, which represent sensitive

person-specific information (e.g., disease and salary) to be protected. Based on

this classification, the privacy requirement defined by k -anonymity is that for

each released record (e.g., a record is a row in the table in Figure 9.1 ) there must

be at least other k − 1 records with the same quasi-identifier values. A set of

records that have the same values for the quasi-identifiers is called equivalence

class . The techniques adopted in the literature to enforce k -anonymity involve

the removal of explicit identifiers and the generalization (e.g., date of birth is

changed to the year of birth) or suppression (e.g., removing the date of birth),

or microaggregation (clustering and averaging) of quasi-identifiers. It is evident

that these techniques reduce the accuracy of the disclosed information.

−

l -diversity

The weakness of the k -anonymity model is that it can allow the disclosure of

sensitive information. In other words, it only protects the identity of a user.

Indeed, if a group of k records all have the same quasi-identifiers values and

the same value of the sensitive attribute, it is not able to protect the sensitive

information. As an example, consider the table in Figure 9.1 . Suppose that the

Mobility Data

Search WWH ::

Custom Search

Home