Pattern-Growth Methods - Frequent Pattern Mining

Database Reference

In-Depth Information

To overcome this difficulty, a new approach, called frequent pattern growth , has been

developed, in a series of studies, such as [ 17 , 18 , 27 - 30 ], which adopts a divide-and-

conquer methodology and mines frequent patterns without candidate generation. The

approach has several distinct features:

1. Instead of generating a large number of candidates, the method preserves (in

some compressed forms) the essential groupings of the original data elements for

mining. Then the analysis is focused on counting the frequency of the relevant

data sets instead of candidate sets.

2. Instead of scanning the entire database to match against the whole corresponding

set of candidates in each pass, the method partitions the data set to be examined as

well as the set of patterns to be examined by database projection. Such a divide-

and-conquer methodology substantially reduces the search space and leads to

high performance.

3. With the growing capacity of main memory and the substantial reduction of

database size by database projection as well as the space needed for manip-

ulating large sets of candidates, a substantial portion of data can be put into

main memory for mining. New data structures and methods, such as FP-tree and

pseudo-projection (for mining sequential patterns), have been developed for data

compression and pointer-based traversal. The performance studies have shown

the effectiveness of such techniques.

A few pieces of work have contributed to the development of the frequent pattern-

growth methodology, as illustrated below.

The TreeProjection method [ 4 ] proposes a database projection technique which

explores the projected databases associated with different frequent itemsets. The FP-

growth algorithm [ 18 ] performs database projection when the database size is huge

and then constructs a compressed data structure, FP-tree, when the compressed tree

can fit in main memory. The remaining mining will be focused on the recursively

generated, projected FP-trees. Besides mining frequent itemsets, the FP-tree structure

can be used for mining frequent closed itemsets, which is presented in the CLOSET

algorithm [ 28 ].

The frequent pattern-growth methodology influences constraint-based mining of

frequent itemsets as well. The constraint-pushing techniques developed for Apriori-

based mining [ 25 ] can be applied to pattern growth mining. In addition, some

complex kinds of constraints, such as convertible constraints, which cannot be pushed

deep into the mining process by Apriori, can be done so with frequent pattern growth

[ 29 ], due to the facts that (1) pattern growth only needs to examine part of the database

(the projected one), and (2) data can be organized in a structured way to facilitate

the controlled growth of frequent patterns.

Similar divide-and-conquer ideas but different projection techniques have been

developed for mining sequential patterns, which are presented in two algorithms,

FreeSpan [ 17 ] and PrefixSpan [ 30 ]. The performance study shows that both methods

outperform the classical Apriori-based sequential pattern mining algorithm GSP [ 35 ],

and PrefixSpan has considerably better performance than FreeSpan.

Frequent Pattern Mining

Search WWH ::

Custom Search

Home