Databases Reference
In-Depth Information
technique in [7] analyzes the k -anonymity method in the presence of increas-
ing dimensionality. The curse of dimensionality becomes especially important
when adversaries may have considerable background information, as a result
of which the boundary between pseudo-identifiers and sensitive attributes may
become blurred. This is generally true, since adversaries may be familiar with
the subject of interest and may have greater information about them than
what is publicly available. This is also the motivation for techniques such as
l -diversity [77] in which background knowledge can be used to make further
privacy attacks. The work in [7] concludes that in order to maintain privacy,
a large number of the attributes may need to be suppressed. Thus, the data
loses its utility for the purpose of data mining algorithms. The broad in-
tuition behind the result in [7] is that when attributes are generalized into
wide ranges, the combination of a large number of generalized attributes is so
sparsely populated, that even two anonymity becomes increasingly unlikely.
While the method of l -diversity has not been formally analyzed, some obser-
vations made in [77] seem to suggest that the method becomes increasingly
infeasible to implement effectively with increasing dimensionality.
The method of randomization has also been analyzed in [10]. This pa-
per makes a first analysis of the ability to re-identify data records with
the use of maximum likelihood estimates. Consider a d -dimensional record
X =( x 1 ...x d ), which is perturbed to Z =( z 1 ...z d ). For a given public
record W =( w 1 ...w d ), we would like to find the probability that it could
have been perturbed to Z using the perturbing distribution f Y ( y ). If this were
true, then the set of values given by ( Z
w d ) should be
all drawn from the distribution f Y ( y ). The corresponding log-likelihood fit is
given by
W )=( z 1
w 1 ...z d
i =1 log( f y ( z i
w i )). The higher the log-likelihood fit, the greater
the probability that the record W corresponds to X . In order to achieve
greater anonymity, we would like the perturbations to be large enough, so
that some of the spurious records in the data have greater log-likelihood fit to
Z than the true record X . It has been shown in [10], that this probability re-
duces rapidly with increasing dimensionality for different kinds of perturbing
distributions. Thus, the randomization technique also seems to be susceptible
to the curse of high dimensionality.
We note that the problem of high dimensionality seems to be a fundamen-
tal one for privacy preservation, and it is unlikely that more effective methods
can be found in order to preserve privacy when background information about
a large number of features is available to even a subset of selected individuals.
Indirect examples of such violations occur with the use of trail identifications
[78, 79], where information from multiple sources can be compiled to create a
high dimensional feature representation which violates privacy.
Search WWH ::




Custom Search