Databases Reference
In-Depth Information
k -anonymity has been widely researched. A good overview and survey of the
corresponding algorithms may be found in [28].
We note that the problem of optimal anonymization is inherently a dicult
one. In [80], it has been shown that the problem of optimal k -anonymization
is NP-hard. Nevertheless, the problem can be solved quite effectively by the
use of a number of heuristic methods. A method proposed by Bayardo and
Agrawal [16] is the k-Optimize algorithm which can often obtain effective
solutions.
The approach assumes an ordering among the quasi-identifier attributes.
The values of the attributes are discretized into intervals (quantitative at-
tributes) or grouped into different sets of values (categorical attributes). Each
such grouping is an item . For a given attribute, the corresponding items are
also ordered. An index is created using these attribute-interval pairs (or items)
and a set enumeration tree is constructed on these attribute-interval pairs.
This set enumeration tree is a systematic enumeration of all possible general-
izations with the use of these groupings. The root of the node is the null node,
and every successive level of the tree is constructed by appending one item
which is lexicographically larger than all the items at that node of the tree.
We note that the number of possible nodes in the tree increases exponentially
with the data dimensionality. Therefore, it is not possible to build the entire
tree even for modest values of n . However, the k -Optimize algorithm can use
a number of pruning strategies to good effect. In particular, a node of the
tree can be pruned when it is determined that no descendent of it could be
optimal. This can be done by computing a bound on the quality of all descen-
dents of that node, and comparing it to the quality of the current best solution
obtained during the traversal process. A branch and bound technique can be
used to successively improve the quality of the solution during the traversal
process. Eventually, it is possible to terminate the algorithm at a maximum
computational time, and use the current solution at that point, which is often
quite good, but may not be optimal.
In [70], the Incognito method has been proposed for computing a k -minimal
generalization with the use of bottom-up aggregation along domain generaliza-
tion hierarchies. The Incognito method uses a bottom-up breadth-first search
of the domain generalization hierarchy, in which it generates all the possi-
ble minimal k -anonymous tables for a given private table. First, it checks
k -anonymity for each single attribute, and removes all those generalizations
which do not satisfy k -anonymity. Then, it computes generalizations in pairs,
again pruning those pairs which do not satisfy the k -anonymity constraints.
In general, the Incognito algorithm computes ( i + 1)-dimensional generaliza-
tion candidates from the i -dimensional generalizations, and removes all those
those generalizations which do not satisfy the k -anonymity constraint. This
approach is continued until, no further candidates can be constructed, or all
possible dimensions have been exhausted. We note that the methods in [71, 70]
use a more general model for k -anonymity than that in [98]. This is because
Search WWH ::




Custom Search