Database Reference
In-Depth Information
adversary knows that Alice was born in 1988, lives in the area with ZIP code
56123 and is in the database. He knows that Alice's record is one of the first
three in the table. Since all of those patients have the same medical condition
(flu), the adversary can identify Alice's disease.
To overcome this weakness the l-diversity model requires obtaining groups
of data subjects with indistinguishable quasi-identifiers and with an acceptable
diversity of sensitive information. In particular, the main idea of this method is
that every k -anonymous group should contain at least l different values for the
attributes containing personal information.
t -closeness
The problemwith l -diversity is that it can be insufficient to prevent the disclosure
of private information when the adversary knows the distribution of the private
values. Indeed, if the adversary has prior belief about the private information of
a data subject, he or she can compare this knowledge with the probability com-
puted from the observation of the disclosed information. In order to avoid this
weakness, the t-closeness model requires that, in any group of quasi-identifiers,
the distribution of the values of a sensitive attribute be close to the distribution
of the attribute values in the overall table. The distance between the two distri-
butions should be no more than a threshold t . Clearly, this limits the information
gain of the adversary after an attack.
Randomization
The randomization model is based on the idea of perturbing the data to be
published by adding a noise quantity. More technically, this method can be
described as follows. Denote by X ={ x 1 ...x m } the original data set. The new
distorted data set, denoted by Z ={ z 1 ...z m } , is obtained by drawing indepen-
dently from the probability distribution a noise quantity n i and adding it to each
record x i X . The set of noise components is denoted by N ={ n 1 ,...,n m } .
The original record values cannot be easily guessed from the distorted data as
the variance of the noise is assumed large enough. Instead, the distribution of
the data set can be easily recovered.
Cryptography-Based Models
The basic idea of the privacy models based on cryptography techniques is to
compute analytical results without sharing the data in such a way that anything
is disclosed except the final result of the analysis. In general, the application of
these models allows one to compute functions over inputs provided by multiple
parties without sharing the inputs. This problem is addressed in cryptography in
the field of secure multi-party computation. As an example, consider a function
f of n arguments and n different parties. If each party has one of the n argu-
ments a protocol is needed that allows exchanging information and computing
Search WWH ::




Custom Search