Privacy-Preserving Data Mining: A Survey - Database Security: Applications and Trends

Databases Reference

In-Depth Information

the method in [98] assumes that the value generalization hierarchy is a tree,

whereas that in [71, 70] assumes that it is a graph.

Two interesting methods for top-down specialization and bottom-up gener-

alization for k -anonymity have been proposed in [46, 107]. In [46], a top-down

heuristic is designed, which starts with a general solution, and then specializes

some attributes of the current solution so as to increase the information, but

reduce the anonymity. The reduction in anonymity is always controlled, so

that k -anonymity is never violated. At the same time each step of the spe-

cialization is controlled by a goodness metric which takes into account both

the gain in information and the loss in anonymity. A complementary method

to top down specialization is that of bottom up generalization , for which an

interesting method is proposed in [107].

We note that generalization and suppression are not the only transfor-

mation techniques for implementing k -anonymity. For example in [35] it is

discussed how to use micro-aggregation in which clusters of records are con-

structed. For each cluster, its representative value is the average value along

each dimension in the cluster. A similar method for achieving anonymity via

clustering is proposed in [13]. The work in [13] also provides constant factor

approximation algorithms to design the clustering. In [8], a related method

has been independently proposed for condensation based privacy-preserving

data mining. This technique generates pseudo-data from clustered groups of

k -records. The process of pseudo-data generation uses principal component

analysis of the behavior of the records within a group. It has been shown in

[8], that the approach can be effectively used for the problem of classification.

We note that the use of pseudo-data provides an additional layer of protec-

tion, since it is dicult to perform adversarial attacks on synthetic data. At

the same time, the aggregate behavior of the data is preserved, and this can

be useful for a variety of data mining problems.

Since the problem of k -anonymization is essentially a search over a space

of possible multi-dimensional solutions, standard heuristic search techniques

such as genetic algorithms or simulated annealing can be effectively used.

Such a technique has been proposed in [112] in which a simulated annealing

algorithm is used in order to generate k -anonymous representations of the

data. Another technique proposed in [55] uses genetic algorithms in order to

construct k -anonymous representations of the data. Both of these techniques

require high computational times, and provide no guarantees on the quality

of the solutions found.

The only known techniques which provide guarantees on the quality of the

solution are approximation algorithms [11, 12, 80], in which the solution found

is guaranteed to be within a certain factor of the cost of the optimal solution.

An approximation algorithm for k -anonymity was proposed in [80], and it

provides an O ( k

log k ) optimal solution. A number of techniques have also

been proposed in [11, 12], which provide O ( k )-approximations to the optimal

cost k -anonymous solutions.

·

Database Security: Applications and Trends

Search WWH ::

Custom Search

Home