Mobility Data and Privacy - Mobility Data

Database Reference

In-Depth Information

adversary knows that Alice was born in 1988, lives in the area with ZIP code

56123 and is in the database. He knows that Alice's record is one of the first

three in the table. Since all of those patients have the same medical condition

(flu), the adversary can identify Alice's disease.

To overcome this weakness the l-diversity model requires obtaining groups

of data subjects with indistinguishable quasi-identifiers and with an acceptable

diversity of sensitive information. In particular, the main idea of this method is

that every k -anonymous group should contain at least l different values for the

attributes containing personal information.

t -closeness

The problemwith l -diversity is that it can be insufficient to prevent the disclosure

of private information when the adversary knows the distribution of the private

values. Indeed, if the adversary has prior belief about the private information of

a data subject, he or she can compare this knowledge with the probability com-

puted from the observation of the disclosed information. In order to avoid this

weakness, the t-closeness model requires that, in any group of quasi-identifiers,

the distribution of the values of a sensitive attribute be close to the distribution

of the attribute values in the overall table. The distance between the two distri-

butions should be no more than a threshold t . Clearly, this limits the information

gain of the adversary after an attack.

Randomization

The randomization model is based on the idea of perturbing the data to be

published by adding a noise quantity. More technically, this method can be

described as follows. Denote by X ={ x 1 ...x m } the original data set. The new

distorted data set, denoted by Z ={ z 1 ...z m } , is obtained by drawing indepen-

dently from the probability distribution a noise quantity n i and adding it to each

record x i ∈ X . The set of noise components is denoted by N ={ n 1 ,...,n m } .

The original record values cannot be easily guessed from the distorted data as

the variance of the noise is assumed large enough. Instead, the distribution of

the data set can be easily recovered.

Cryptography-Based Models

The basic idea of the privacy models based on cryptography techniques is to

compute analytical results without sharing the data in such a way that anything

is disclosed except the final result of the analysis. In general, the application of

these models allows one to compute functions over inputs provided by multiple

parties without sharing the inputs. This problem is addressed in cryptography in

the field of secure multi-party computation. As an example, consider a function

f of n arguments and n different parties. If each party has one of the n argu-

ments a protocol is needed that allows exchanging information and computing

Search WWH ::

Custom Search

Home