Database Reference
In-Depth Information
The third dimension, refers to whether a single sensitive rule or a set of sensitive
rules can be hidden during an iteration of the hiding algorithm. Based on this criterion
we differentiate hiding algorithms into single rule and multiple rule schemes.
The fourth dimension has to do with the nature of the hiding algorithm, which
can be either heuristic or exact . Heuristic algorithms take decisions that aim at
optimizing certain sub-goals in the hiding process, are computationally efficient,
but do not guarantee optimality. The formulation of the ARH problem (presented in
Sect. 3.1) implies that there are two specific sub-goals that need to be attained by
every ARH algorithm. The first sub-goal [ (a) ], which is basically the most important,
is to try to hide as many sensitive rules as possible. The second sub-goal [ (b), (c), (d) ]
is to manage to hide the sensitive rules by minimizing side-effects. Different hiding
algorithms give different priorities to the satisfaction of the sub-goals presented,
producing in this way a list of hiding primitives.
Exact techniques, on the other hand, rely on formulating the ARH problem so that
a solution satisfying all the sub-goals can be found. Of course, there is a high possi-
bility that an exact approach fails to give a solution, and for this reason, some of the
sub-goals need to be relaxed. However, this relaxation process is still part of the exact
approach, which makes it different from the heuristic approaches. Although exact ap-
proaches lead to better solutions than heuristic algorithms, they are computationally
demanding and can be applied only to small and medium-size datasets.
The fifth and final dimension determines whether a hiding algorithm preprocesses
the user-specified sensitive rules so that a minimal set of sensitive rules are given
as input to the hiding technique. The corresponding techniques make use of the
border of the frequent itemsets [ 31 ], which provides a compact representation of the
frequent itemsets mined from a database, to facilitate knowledge hiding. Specifically,
given the set of frequent itemsets F mined from the transactions of database U , the
negative border of F , denoted as B ( F ), is defined as the set of all infrequent
itemsets mined from U in which all proper subsets appear in F . Similarly, the
positive border of F , denoted as B + ( F ), is defined as the set of all maximally
frequent itemsets appearing in F . The positive and the negative border formulate
collectively the border B ( F ) of frequent itemsets. Formally stated, if
I
is the set of
all items appearing in U , then B ( F )
={
C x I
: C x /
F
∧∀
C y
C x : C y
F
}
,
B + ( F )
B + ( F ).
Border-based hiding techniques compute the border of the frequent itemsets and
modify it appropriately by recomputing it, in such a way, that a minimal set of
sensitive rules joins the newly computed border. The algorithms are subsequently
driven by the negative and positive border for hiding the rules [ 21 , 31 ].
B ( F )
={
C x I
: C x
F
∧∀
C y
C x : C y /
F
}
, and B ( F )
=
3.3
Heuristic and Exact ARH Algorithms
Among the different dimensions in the taxonomy of frequent itemset and association
rule hiding algorithms, the prevalent dimension regards the nature of the hiding
algorithms. As stated before, ARH algorithms can be divided into two broad classes,
Search WWH ::




Custom Search