Databases Reference
In-Depth Information
the method in [98] assumes that the value generalization hierarchy is a tree,
whereas that in [71, 70] assumes that it is a graph.
Two interesting methods for top-down specialization and bottom-up gener-
alization for k -anonymity have been proposed in [46, 107]. In [46], a top-down
heuristic is designed, which starts with a general solution, and then specializes
some attributes of the current solution so as to increase the information, but
reduce the anonymity. The reduction in anonymity is always controlled, so
that k -anonymity is never violated. At the same time each step of the spe-
cialization is controlled by a goodness metric which takes into account both
the gain in information and the loss in anonymity. A complementary method
to top down specialization is that of bottom up generalization , for which an
interesting method is proposed in [107].
We note that generalization and suppression are not the only transfor-
mation techniques for implementing k -anonymity. For example in [35] it is
discussed how to use micro-aggregation in which clusters of records are con-
structed. For each cluster, its representative value is the average value along
each dimension in the cluster. A similar method for achieving anonymity via
clustering is proposed in [13]. The work in [13] also provides constant factor
approximation algorithms to design the clustering. In [8], a related method
has been independently proposed for condensation based privacy-preserving
data mining. This technique generates pseudo-data from clustered groups of
k -records. The process of pseudo-data generation uses principal component
analysis of the behavior of the records within a group. It has been shown in
[8], that the approach can be effectively used for the problem of classification.
We note that the use of pseudo-data provides an additional layer of protec-
tion, since it is dicult to perform adversarial attacks on synthetic data. At
the same time, the aggregate behavior of the data is preserved, and this can
be useful for a variety of data mining problems.
Since the problem of k -anonymization is essentially a search over a space
of possible multi-dimensional solutions, standard heuristic search techniques
such as genetic algorithms or simulated annealing can be effectively used.
Such a technique has been proposed in [112] in which a simulated annealing
algorithm is used in order to generate k -anonymous representations of the
data. Another technique proposed in [55] uses genetic algorithms in order to
construct k -anonymous representations of the data. Both of these techniques
require high computational times, and provide no guarantees on the quality
of the solutions found.
The only known techniques which provide guarantees on the quality of the
solution are approximation algorithms [11, 12, 80], in which the solution found
is guaranteed to be within a certain factor of the cost of the optimal solution.
An approximation algorithm for k -anonymity was proposed in [80], and it
provides an O ( k
log k ) optimal solution. A number of techniques have also
been proposed in [11, 12], which provide O ( k )-approximations to the optimal
cost k -anonymous solutions.
ยท
Search WWH ::




Custom Search