Database Reference
In-Depth Information
To overcome this difficulty, a new approach, called frequent pattern growth , has been
developed, in a series of studies, such as [ 17 , 18 , 27 - 30 ], which adopts a divide-and-
conquer methodology and mines frequent patterns without candidate generation. The
approach has several distinct features:
1. Instead of generating a large number of candidates, the method preserves (in
some compressed forms) the essential groupings of the original data elements for
mining. Then the analysis is focused on counting the frequency of the relevant
data sets instead of candidate sets.
2. Instead of scanning the entire database to match against the whole corresponding
set of candidates in each pass, the method partitions the data set to be examined as
well as the set of patterns to be examined by database projection. Such a divide-
and-conquer methodology substantially reduces the search space and leads to
high performance.
3. With the growing capacity of main memory and the substantial reduction of
database size by database projection as well as the space needed for manip-
ulating large sets of candidates, a substantial portion of data can be put into
main memory for mining. New data structures and methods, such as FP-tree and
pseudo-projection (for mining sequential patterns), have been developed for data
compression and pointer-based traversal. The performance studies have shown
the effectiveness of such techniques.
A few pieces of work have contributed to the development of the frequent pattern-
growth methodology, as illustrated below.
The TreeProjection method [ 4 ] proposes a database projection technique which
explores the projected databases associated with different frequent itemsets. The FP-
growth algorithm [ 18 ] performs database projection when the database size is huge
and then constructs a compressed data structure, FP-tree, when the compressed tree
can fit in main memory. The remaining mining will be focused on the recursively
generated, projected FP-trees. Besides mining frequent itemsets, the FP-tree structure
can be used for mining frequent closed itemsets, which is presented in the CLOSET
algorithm [ 28 ].
The frequent pattern-growth methodology influences constraint-based mining of
frequent itemsets as well. The constraint-pushing techniques developed for Apriori-
based mining [ 25 ] can be applied to pattern growth mining. In addition, some
complex kinds of constraints, such as convertible constraints, which cannot be pushed
deep into the mining process by Apriori, can be done so with frequent pattern growth
[ 29 ], due to the facts that (1) pattern growth only needs to examine part of the database
(the projected one), and (2) data can be organized in a structured way to facilitate
the controlled growth of frequent patterns.
Similar divide-and-conquer ideas but different projection techniques have been
developed for mining sequential patterns, which are presented in two algorithms,
FreeSpan [ 17 ] and PrefixSpan [ 30 ]. The performance study shows that both methods
outperform the classical Apriori-based sequential pattern mining algorithm GSP [ 35 ],
and PrefixSpan has considerably better performance than FreeSpan.
Search WWH ::




Custom Search