Databases Reference
In-Depth Information
2.4 Multiplicative Perturbations
The most common method of randomization is that of additive perturba-
tions. However, multiplicative perturbations can also be used to good effect
for privacy-preserving data mining. Many of these techniques derive their roots
in the work of [57] which shows how to use multi-dimensional projections in
order to reduce the dimensionality of the data. This technique preserves the
inter-record distances approximately, and therefore the transformed records
can be used in conjunction with a variety of data mining applications. In par-
ticular, the approach is discussed in detail in [87, 88], in which it is shown how
to use the method for privacy-preserving clustering. The technique can also
be applied to the problem of classification as discussed in [25]. Multiplicative
perturbations can also be used for distributed privacy-preserving data mining.
Details can be found in [75]. A number of techniques for multiplicative pertur-
bation in the context of masking census data may be found in [66]. A variation
on this theme may be implemented with the use of distance preserving Fourier
transforms, which work effectively for a variety of cases [82].
As in the case of additive perturbations, multiplicative perturbations are
not entirely safe from adversarial attacks. In general, if the attacker has no
prior knowledge of the data, then it is relatively dicult to attack the privacy
of the transformation. However, with some prior knowledge, two kinds of
attacks are possible [76]:
Known Input-Output Attack: In this case, the attacker knows some
linearly independent collection of records, and their corresponding per-
turbed version. In such cases, linear algebra techniques can be used to
reverse-engineer the nature of the privacy preserving transformation.
Known Sample Attack: In this case, the attacker has a collection of in-
dependent data samples from the same distribution from which the original
data was drawn. In such cases, principal component analysis techniques
can be used in order to reconstruct the behavior of the original data.
2.5 Data Swapping
We note that noise addition or multiplication is not the only technique which
can be used to perturb the data. A related method is that of data swapping,
in which the values across different records are swapped in order to perform
the privacy-preservation [45]. One advantage of this technique is that the
lower order marginal totals of the data are completely preserved and are not
perturbed at all. Therefore certain kinds of aggregate computations can be
exactly performed without violating the privacy of the data. We note that this
technique does not follow the general principle in randomization which allows
the value of a record to be perturbed independent;y of the other records.
Therefore, this technique can be used in combination with other frameworks
such as k -anonymity, as long as the swapping process is designed to preserve
the definitions of privacy for that model.
Search WWH ::




Custom Search