Frequent Pattern Mining Algorithms: A Survey - Frequent Pattern Mining

Database Reference

In-Depth Information

Various database projection methods are different in terms of the specific data

structures used for the projected database. The different variations of TreeProjec-

tion use arrays and bit strings to represent the projected database. The FP-growth

method uses an FP-Tree. The FP-Tree will be discussed in the next section. Later

variations of FP-Tree also use combinations of arrays and pointers to represent

the projected database. Some variations, such as OpportuneProject [ 38 ], combine

different data structures in an optimized way to obtain the best result.

Suffix-based recursive growth is inherently defined as a depth-first strategy. On

the other hand, as is evident from the discussion in [ 5 ], the specific choice of ex-

ploration strategy on the enumeration tree is orthogonal to the process of database

projection. The overall size of the enumeration tree is the same, no matter how it is

explored, unless maximal pattern pruning is used. Thus, TreeProjection explores

a variety of strategies such as breadth-first and depth-first strategies, with no dif-

ference to the (overall) work required for counting. The major challenge with the

breadth-first strategy is the simultaneous maintenance of projected transaction

sets along the breadth of the tree. The issue of effective memory management of

breadth-first strategies is discussed in [ 5 ], which shows how certain optimizations

such as cache-blocking can improve the effectiveness in this case. Breadth-first

strategies also allow certain kinds of pruning such as level-wise pruning.

The major advantages of depth-first strategies arise in the context of maximal pat-

tern mining. This is because a depth-first strategy discovers the maximal patterns

very early, which can be used to prune the smaller non-maximal patterns. In this

case, the size of the search space explored truly reduces because of a depth-first

strategy. This issue is discussed in the section on maximal pattern mining. The

advantages for maximal pattern mining were first proposed in the context of the

DepthProject algorithm [ 4 ].

Next, we will describe the FP-Tree data structure that uses compressed representa-

tions of the transaction database for more efficient counting.

4.1

The FP-Growth Approach

The FP-growth approach combines suffix-based pattern exploration with a com-

pressed representation of the projected database for more efficient counting. The

prefix-based FP-Tree is a compressed representation of the database which is built

by considering a fixed order among the items in an itemset [ 32 ]. This tree is used to

represent the conditional transaction sets

T i of Fig. 2.9 . An FP-Tree may be

viewed as a prefix-based trie data structure of the transaction database of frequent

items. Just as each node in a trie is labeled with a symbol, a node in the FP-Tree is

labeled with an item. In addition, the node holds the support of the itemset defined

by the items of the nodes that are on the path from the root to u . By consolidating the

prefixes, one obtains compression. This is useful for effective memory management.

On the other hand, the maintenance of counts and pointers with the prefixes is an

T

and

Frequent Pattern Mining

Search WWH ::

Custom Search

Home