Database Reference
In-Depth Information
namely heuristic approaches and exact approaches. In this section, we review some
of the most popular approaches that belong to each class. Furthermore, we devote
a section to discuss border-based approaches, an important category of heuristic
approaches that has also influenced the design of exact hiding algorithms. Due to
their importance, in what follows we refer to border-based approaches as the third
class of ARH algorithms.
Heuristic approaches involve efficient, fast algorithms that selectively sanitize a
set of transactions from the database to hide the sensitive knowledge. Due to their
efficiency and scalability, the heuristic approaches have been the focus of attention
for the vast majority of researchers in the knowledge hiding field. However, there are
several circumstances in which they suffer from undesirable side-effects that lead
them to poor solutions.
Border-based approaches consider the task of sensitive rule hiding through modi-
fication of the original borders in the lattice of the frequent and the infrequent patterns
in the dataset. In these schemes, the sensitive knowledge is hidden by enforcing the
revised borders (which accommodate the hiding of the sensitive itemsets) in the san-
itized database. The algorithms in this class differ both in the borders that they track
and use for the hiding strategy, as well as in the methodology that they follow to
enforce the revised borders in the sanitized dataset.
Finally, exact approaches contain non-heuristic algorithms which conceive the
hiding process as a constraint satisfaction problem that they solve by using integer
or linear programming. The main difference of these approaches, compared to the
previous ones, is the fact that the sanitization process guarantees optimality in the
hiding solution, provided that an optimal solution exists. On the other hand, these
approaches are usually several orders of magnitude slower than the heuristic ones,
particularly due to the runtime of the integer/linear programming solver. For this,
they are applicable only in small and medium-size datasets.
Heuristic Approaches In this section, we review support-based and confidence-
based heuristic approaches, which are based on either distortion or blocking of the
original values. Between these two categories of approaches, the distortion-based
are the ones commonly adopted by the overwhelming majority of researchers.
Support-based and Confidence-based Distortion Schemes Atallah et al. [ 11 ]
were the first to propose an algorithm for the hiding of sensitive association rules
through the reduction in the support of their generating itemsets. The authors pro-
pose the construction of a lattice-like graph in the database. Through this graph,
the hiding of a large itemset, related to the existence of a sensitive rule, is achieved
by a greedy iterative traversal of its immediate subsets, selection of the subset that
has the maximum support among all candidates (therefore is less probable to be
hidden) and setting of this itemset as the new candidate to be hidden. By iteratively
following these steps, the algorithm identifies the 1-itemset ancestor of the initial
sensitive itemset, having the highest support. Then, by identifying the supporting
transactions for both the initial candidate and the currently identified 1-itemset, the
algorithm removes the 1-itemset from the supporting transaction which affects the
least number of 2-itemsets. In sequel, the algorithm propagates the results of this
Search WWH ::




Custom Search