Database Reference
In-Depth Information
Essentially, compared to the earlier level-wise algorithm, this algorithm traverses the
search space in a different order in which some long patterns are already considered
before some shorter patterns are evaluated. As a result, optimizations based on the
fact that short patterns have been seen before long patterns are not used. In practice,
however, these algorithms can be more efficient. The reason for this is that most
implementations take care to maintain datastructures which allow for incremental
constraint evaluation: for instance, to calculate the support of a pattern, they do
not traverse the whole dataset, but only consider those transactions covered by the
pattern's parent in the search tree. As depth-first algorithms do not need to maintain
a large number of candidates, maintaining such additional data structures is feasible.
A well-known datastructure in this context is the FP-Tree (see the chapter on pattern
growth for more details) [ 27 ].
Note that the above algorithm works for any pattern language, including graphs,
strings and trees, as long as we know that the constraint ϕ is anti-monotonic.
When some constraints are not anti-monotonic, a basic approach for dealing with
them, as in the case of breadth-first search, is to ignore them during the search
and post-process the output of the above agorithm. A similar trick can be used for
boundable constraints. In this case, the anti-monotonic bound is used during the
depth-first search, and each pattern found is finally evaluated using the original
constraint in a post-processing step.
Many studies have explored the possibilities for deriving more efficient algorithms
for more complex constraints than anti-monotonic constraints. Most of these studies
have focused on the pattern language of itemsets , as it appears additional pruning is
most easily derived for itemsets. We will discuss these approaches in a generic way
in the next paragraph, inspired by work of Bucila et al. and Guns et al. [ 8 , 13 ].
4.2
Constraint-based Itemset Mining
The key idea in efficient depth-first constraint-based itemset mining algorithms is to
maintain four sets in each node of the search tree, which are upper- and lower-bounds
on the itemsets and transaction sets that can still be found:
￿ I U , the largest itemset we believe we can still find;
￿ I L , the smallest itemset we believe we can still find;
￿ T U , the largest transaction set we believe we can still find;
￿ T L , the smallest transaction set we believe we can still find.
For some constraints, not all these 4 sets need to be maintained, but in the most
generic setting all 4 sets are maintained.
During the search, any modification of any of these 4 sets may be a reason to
modify another of these 4 sets as well. We will refer to this process of modifying one
set based on the modification of another set as propagation . Different approaches
differ in the algorithms and data structures used to do propagation.
An overview of the generic algorithm is given in Algorithm 3, in which I L
=
T L =∅
, I U
= I
and T U
= D
. Line 1 performs the propagation for the constraints.
Search WWH ::




Custom Search