Frequent Pattern Mining Algorithms: A Survey - Frequent Pattern Mining

Database Reference

In-Depth Information

smaller subproblem that can be solved recursively. The FP-growth approach uses the

suffix-based pattern exploration, as illustrated in Fig. 2.9 . In addition, the FP-growth

approach uses an efficient data structure, known as the FP-Tree to represent the con-

ditional transaction database

T i with the use of compressed prefixes . The FP-Tree

will be discussed in more detail in a later section. The suffix in the top level call to

the algorithm is the null itemset.

Recursive suffix-based exploration of the pattern space is, in principle, no different

from prefix-based exploration of the enumeration tree space with the ordering of the

items reversed. In other words, by using a reverse ordering of items, suffix-based

recursive pattern space exploration can be simulated with prefix-based enumeration

tree exploration. Indeed, as discussed in the last section, prefix-based enumeration

tree methods order items from the least frequent to the most frequent, whereas the

suffix-based methods of this section order items from the most frequent to the least

frequent, to account for this difference. Thus, suffix-based recursive growth has an

execution tree that is identical in structure to a prefix-based enumeration tree. This

is a difference only of convention, but it does not affect the pattern space that is

explored.

It is instructive to compare the suffix-based exploration with the pseudocode of

the prefix-based TreeProjection algorithm in Fig. 2.6 . The two pseudocodes are

structured differently because the initial pre-processing pass of removing frequent

items is not assumed in the TreeProjection algorithm. Therefore, in each recursive

call of the prefix-based TreeProjection , frequent itemsets must be counted before they

are reported. In suffix-based exploration, this step is done as a preprocessing step

(for the top-level call) and just before the recursive call for deeper calls. Therefore,

each recursive call always starts with a database of frequent items. This is, of course,

a difference in terms of how the recursive calls are structured but is not different

in terms of the basic search strategy, or the amount of overall computational work

required, because infrequent items need to be removed in either case. A few other

key differences are evident:

TreeProjection uses database projections on top of a prefix-based enumeration

tree. Suffix-based recursive methods have a recursion tree whose structure is

similar to an enumeration tree on the frequent suffixes instead of the prefixes. The

removal of infrequent items from

T i in FP-growth is similar to determining which

branches of the enumeration tree to extend further.

The use of suffix-based exploration is a difference only of convention from

prefix-based exploration. For example, after reversing the item order, one might

implement FP-growth by growing patterns on the prefixes, but constructing a

compressed FP-Tree on 1 the suffixes. The resulting exploration order and execu-

tion in the two different implementations of FP-growth will be identical, but the

latter can be more easily related to traditional enumeration tree methods.

1

The resulting FP-Tree will be a suffix-based trie.

Frequent Pattern Mining

Search WWH ::

Custom Search

Home