Database Reference
In-Depth Information
effectiveness of these different strategies depend on each other. For example, the
effectiveness of a pruning strategy may be dependent on the order of exploration of
the candidates (level-wise vs. depth first), and the effectiveness of counting is also
dependent on the order of exploration because the work done for counting at the
higher levels (shorter itemsets) can be reused at the lower levels (longer itemsets)
with certain strategies, such as those explored in TreeProjection and FP-growth .
Surprising as it might seem, virtually all frequent pattern mining algorithms can be
considered complex variations of this simple baseline pseudocode. The major chal-
lenge of all of these methods is that the number of frequent patterns and candidate
patterns can sometimes be large. This is a fundamental problem of frequent pattern
mining although it is possible to speed up the counting of the different candidate
patterns with the use of various tricks such as database projections. An analysis on
the number of candidate patterns may be found in [ 25 ].
The candidate generation process of the earliest algorithms used joins. The original
Apriori algorithm belongs to this category [ 1 ]. Although Apriori is presented as a join-
based algorithm, it can be shown that the algorithm is a breadth first exploration of a
structured arrangement of the itemsets, known as a lexicographic tree or enumeration
tree . Therefore, later classes of algorithms explicitly discuss tree-based enumeration
[ 4 , 5 ]. The algorithms assume a lexicographic tree (or enumeration tree) of candidate
patterns and explore the tree using breadth-first or depth-first strategies. The use of
the enumeration tree forms the basis for understanding search space decomposition,
as in the case of the TreeProjection algorithm [ 5 ]. The enumeration tree concept is
very useful because it provides an understanding of how the search space of candidate
patterns may be explored in a systematic and non-redundant way. Frequent pattern
mining algorithms typically need to evaluate the support of frequent portions of
the enumeration tree, and also rule out an additional layer of infrequent extensions
of the frequent nodes in the enumeration tree. This makes the candidate space of
all frequent pattern mining algorithms virtually invariant unless one is interested in
particular types of patterns such as maximal patterns.
The enumeration tree is defined on the prefixes of frequent itemsets, and will
be introduced later in this chapter. Later algorithms such as FP-growth perform
suffix-based recursive exploration of the search space. In other words, the frequent
patterns with a particular pattern as a suffix are explored at one time. This is because
FP-growth uses the opposite item ordering convention as most enumeration tree
algorithms though the recursive exploration order of FP-growth is similar to an
enumeration tree.
Note that all classes of algorithms, implicitly or explicitly, explore the search
space of patterns defined by an enumeration tree of frequent patterns with different
strategies such as joins, prefix-based depth-first exploration, or suffix-based depth-
first exploration. However, there are significant differences in terms of the order in
which the search space is explored, the pruning methods used, and how the counting
is performed. In particular, certain projection-based methods help in reusing the
counting work for k -itemsets for ( k
+
1)-itemsets with the use of the notion of
projected databases. Many algorithms such as TreeProjection and FP-growth are
able to achieve this goal.
Search WWH ::




Custom Search