Privacy-Preserving Data Mining: A Survey - Database Security: Applications and Trends

Databases Reference

In-Depth Information

k -anonymity has been widely researched. A good overview and survey of the

corresponding algorithms may be found in [28].

We note that the problem of optimal anonymization is inherently a dicult

one. In [80], it has been shown that the problem of optimal k -anonymization

is NP-hard. Nevertheless, the problem can be solved quite effectively by the

use of a number of heuristic methods. A method proposed by Bayardo and

Agrawal [16] is the k-Optimize algorithm which can often obtain effective

solutions.

The approach assumes an ordering among the quasi-identifier attributes.

The values of the attributes are discretized into intervals (quantitative at-

tributes) or grouped into different sets of values (categorical attributes). Each

such grouping is an item . For a given attribute, the corresponding items are

also ordered. An index is created using these attribute-interval pairs (or items)

and a set enumeration tree is constructed on these attribute-interval pairs.

This set enumeration tree is a systematic enumeration of all possible general-

izations with the use of these groupings. The root of the node is the null node,

and every successive level of the tree is constructed by appending one item

which is lexicographically larger than all the items at that node of the tree.

We note that the number of possible nodes in the tree increases exponentially

with the data dimensionality. Therefore, it is not possible to build the entire

tree even for modest values of n . However, the k -Optimize algorithm can use

a number of pruning strategies to good effect. In particular, a node of the

tree can be pruned when it is determined that no descendent of it could be

optimal. This can be done by computing a bound on the quality of all descen-

dents of that node, and comparing it to the quality of the current best solution

obtained during the traversal process. A branch and bound technique can be

used to successively improve the quality of the solution during the traversal

process. Eventually, it is possible to terminate the algorithm at a maximum

computational time, and use the current solution at that point, which is often

quite good, but may not be optimal.

In [70], the Incognito method has been proposed for computing a k -minimal

generalization with the use of bottom-up aggregation along domain generaliza-

tion hierarchies. The Incognito method uses a bottom-up breadth-first search

of the domain generalization hierarchy, in which it generates all the possi-

ble minimal k -anonymous tables for a given private table. First, it checks

k -anonymity for each single attribute, and removes all those generalizations

which do not satisfy k -anonymity. Then, it computes generalizations in pairs,

again pruning those pairs which do not satisfy the k -anonymity constraints.

In general, the Incognito algorithm computes ( i + 1)-dimensional generaliza-

tion candidates from the i -dimensional generalizations, and removes all those

those generalizations which do not satisfy the k -anonymity constraint. This

approach is continued until, no further candidates can be constructed, or all

possible dimensions have been exhausted. We note that the methods in [71, 70]

use a more general model for k -anonymity than that in [98]. This is because

Database Security: Applications and Trends

Search WWH ::

Custom Search

Home