Databases Reference
In-Depth Information
2.2 Adversarial Attacks on Randomization
In the earlier section on privacy quantification, we illustrated an example in
which the reconstructed distribution on the data can be used in order to reduce
the privacy of the underlying data record. In general, a systematic approach
can be used to do this in multi-dimensional data sets with the use of spectral
filtering or PCA based techniques [50, 62]. The broad idea in techniques such
as PCA [50] is that the correlation structure in the original data can be
estimated fairly accurately (in larger data sets) even after noise addition. Once
the broad correlation structure in the data has been determined, one can then
try to remove the noise in the data in such a way that it fits the aggregate
correlation structure of the data. It has been shown that such techniques can
reduce the privacy of the perturbation process significantly since the noise
removal results in values which are fairly close to their original values [50, 62].
Some other discussions on limiting breaches of privacy in the randomization
method may be found in [43].
A second kind of adversarial attack is with the use of public information.
Consider a record X =( x 1 ...x d ), which is perturbed to Z =( z 1 ...z d ).
Then, since the distribution of the perturbations is known, we can try to use
a maximum likelihood fit of the potential perturbation of Z to a public record.
Consider the publicly public record W =( w 1 ...w d ). Then, the potential per-
turbation of Z with respect to W is given by ( Z
W )=( z 1
w 1 ...z d
w d ).
Each of these values ( z i
w i ) should fit the distribution f Y ( y ). The corre-
sponding log-likelihood fit is given by
i =1 log( f y ( z i
w i )). The higher the
log-likelihood fit, the greater the probability that the record W corresponds
to X . If it is known that the public data set always includes X , then the
maximum likelihood fit can provide a high degree of certainty in identifying
the correct record, especially in cases where d is large. We will discuss this
issue in greater detail in a later section.
2.3 Randomization Methods for Data Streams
The randomization approach is particularly well suited to privacy-preserving
data mining of streams, since the noise added to a given record is indepen-
dent of the rest of the data. However, streams provide a particularly vul-
nerable target for adversarial attacks with the use of PCA based techniques
[50] because of the large volume of the data available for analysis. In [73],
an interesting technique for randomization has been proposed which uses the
auto-correlations in different time series while deciding the noise to be added
to any particular value. It has been shown in [73] that such an approach
is more robust since the noise correlates with the stream behavior, and it is
more dicult to create effective adversarial attacks with the use of correlation
analysis techniques.
Search WWH ::




Custom Search