Frequent Pattern Mining Algorithms: A Survey - Frequent Pattern Mining

Database Reference

In-Depth Information

effectiveness of these different strategies depend on each other. For example, the

effectiveness of a pruning strategy may be dependent on the order of exploration of

the candidates (level-wise vs. depth first), and the effectiveness of counting is also

dependent on the order of exploration because the work done for counting at the

higher levels (shorter itemsets) can be reused at the lower levels (longer itemsets)

with certain strategies, such as those explored in TreeProjection and FP-growth .

Surprising as it might seem, virtually all frequent pattern mining algorithms can be

considered complex variations of this simple baseline pseudocode. The major chal-

lenge of all of these methods is that the number of frequent patterns and candidate

patterns can sometimes be large. This is a fundamental problem of frequent pattern

mining although it is possible to speed up the counting of the different candidate

patterns with the use of various tricks such as database projections. An analysis on

the number of candidate patterns may be found in [ 25 ].

The candidate generation process of the earliest algorithms used joins. The original

Apriori algorithm belongs to this category [ 1 ]. Although Apriori is presented as a join-

based algorithm, it can be shown that the algorithm is a breadth first exploration of a

structured arrangement of the itemsets, known as a lexicographic tree or enumeration

tree . Therefore, later classes of algorithms explicitly discuss tree-based enumeration

[ 4 , 5 ]. The algorithms assume a lexicographic tree (or enumeration tree) of candidate

patterns and explore the tree using breadth-first or depth-first strategies. The use of

the enumeration tree forms the basis for understanding search space decomposition,

as in the case of the TreeProjection algorithm [ 5 ]. The enumeration tree concept is

very useful because it provides an understanding of how the search space of candidate

patterns may be explored in a systematic and non-redundant way. Frequent pattern

mining algorithms typically need to evaluate the support of frequent portions of

the enumeration tree, and also rule out an additional layer of infrequent extensions

of the frequent nodes in the enumeration tree. This makes the candidate space of

all frequent pattern mining algorithms virtually invariant unless one is interested in

particular types of patterns such as maximal patterns.

The enumeration tree is defined on the prefixes of frequent itemsets, and will

be introduced later in this chapter. Later algorithms such as FP-growth perform

suffix-based recursive exploration of the search space. In other words, the frequent

patterns with a particular pattern as a suffix are explored at one time. This is because

FP-growth uses the opposite item ordering convention as most enumeration tree

algorithms though the recursive exploration order of FP-growth is similar to an

enumeration tree.

Note that all classes of algorithms, implicitly or explicitly, explore the search

space of patterns defined by an enumeration tree of frequent patterns with different

strategies such as joins, prefix-based depth-first exploration, or suffix-based depth-

first exploration. However, there are significant differences in terms of the order in

which the search space is explored, the pruning methods used, and how the counting

is performed. In particular, certain projection-based methods help in reusing the

counting work for k -itemsets for ( k

+

1)-itemsets with the use of the notion of

projected databases. Many algorithms such as TreeProjection and FP-growth are

able to achieve this goal.

Frequent Pattern Mining

Search WWH ::

Custom Search

Home