Database Reference
In-Depth Information
smaller subproblem that can be solved recursively. The FP-growth approach uses the
suffix-based pattern exploration, as illustrated in Fig. 2.9 . In addition, the FP-growth
approach uses an efficient data structure, known as the FP-Tree to represent the con-
ditional transaction database
T i with the use of compressed prefixes . The FP-Tree
will be discussed in more detail in a later section. The suffix in the top level call to
the algorithm is the null itemset.
Recursive suffix-based exploration of the pattern space is, in principle, no different
from prefix-based exploration of the enumeration tree space with the ordering of the
items reversed. In other words, by using a reverse ordering of items, suffix-based
recursive pattern space exploration can be simulated with prefix-based enumeration
tree exploration. Indeed, as discussed in the last section, prefix-based enumeration
tree methods order items from the least frequent to the most frequent, whereas the
suffix-based methods of this section order items from the most frequent to the least
frequent, to account for this difference. Thus, suffix-based recursive growth has an
execution tree that is identical in structure to a prefix-based enumeration tree. This
is a difference only of convention, but it does not affect the pattern space that is
explored.
It is instructive to compare the suffix-based exploration with the pseudocode of
the prefix-based TreeProjection algorithm in Fig. 2.6 . The two pseudocodes are
structured differently because the initial pre-processing pass of removing frequent
items is not assumed in the TreeProjection algorithm. Therefore, in each recursive
call of the prefix-based TreeProjection , frequent itemsets must be counted before they
are reported. In suffix-based exploration, this step is done as a preprocessing step
(for the top-level call) and just before the recursive call for deeper calls. Therefore,
each recursive call always starts with a database of frequent items. This is, of course,
a difference in terms of how the recursive calls are structured but is not different
in terms of the basic search strategy, or the amount of overall computational work
required, because infrequent items need to be removed in either case. A few other
key differences are evident:
￿
TreeProjection uses database projections on top of a prefix-based enumeration
tree. Suffix-based recursive methods have a recursion tree whose structure is
similar to an enumeration tree on the frequent suffixes instead of the prefixes. The
removal of infrequent items from
T i in FP-growth is similar to determining which
branches of the enumeration tree to extend further.
￿
The use of suffix-based exploration is a difference only of convention from
prefix-based exploration. For example, after reversing the item order, one might
implement FP-growth by growing patterns on the prefixes, but constructing a
compressed FP-Tree on 1 the suffixes. The resulting exploration order and execu-
tion in the two different implementations of FP-growth will be identical, but the
latter can be more easily related to traditional enumeration tree methods.
1
The resulting FP-Tree will be a suffix-based trie.
Search WWH ::




Custom Search