Database Reference
In-Depth Information
transaction into all of its projected databases in one scan, whereas the latter projects
each transaction only to its first projected database (according to the ordering of
items). The former facilitates parallel processing but requires large disk space to
store all of the projected databases, whereas the latter ensures that the additional
disk space required is no more than the original database but it needs additional
projections of its (projected) transactions to subsequent projected databases in the
later processing.
After one or a few rounds of projections, the corresponding conditional FP-trees
should be able to fit in memory. Then a memory-based FP-tree can be constructed
for fast mining.
The FP-growth algorithm is presented in [ 18 ]. Its performance analysis shows
that the FP-growth mining of both long and short frequent itemsets is efficient and
scalable. It is about an order of magnitude faster than Apriori [ 1 ] and other candidate
generation-based algorithms, and is also faster than TreeProjection, a projection-
based algorithm proposed in [ 4 ].
In comparison with the candidate generation-based algorithms, FP-growth has the
following advantages: (1) FP-tree is highly compact, usually substantially smaller
than the original database, and thus saves the costly database scans in the subsequent
mining process. (2) It avoids costly candidate sets generation and test by succes-
sively concatenating frequent 1-itemsets found in the (conditional) FP-trees: It never
generates any combinations of new candidate sets which are not in the database be-
cause the itemset in any transaction is always encoded in the corresponding path of
the FP-trees. In this context, the mining methodology is not Apriori-like ( restricted )
generation-and-test but frequent pattern ( fragment ) growth only . The major oper-
ations of mining are count accumulation and prefix path count adjustment, which
are usually much less costly than candidate generation and itemset matching opera-
tions performed in most Apriori-like algorithms. (3) It applies a partitioning-based
divide-and-conquer method which dramatically reduces the size of the subsequent
conditional sub-databases and conditional FP-trees. Several other optimization tech-
niques, including ordering of frequent items, and employing the least frequent events
as suffix, also contribute to the efficiency of the method.
Besides mining frequent itemsets, an extension of the FP-growth method, called
CLOSET [ 28 ], can be used to mine frequent closed itemsets and max-patterns, where
a frequent closed itemset is a frequent itemset, c , where there is no proper superset
of c sharing the same support count with c , and a max-pattern is a frequent pattern,
p , such that any proper superpattern of p is not frequent. Max-patterns and frequent
closed itemset can be used to reduce the number of frequent itemsets and association
rules generated at association mining.
By frequent pattern growth, one can also mine closed frequent itemsets and max-
patterns, using the FP-tree structure. Moreover, a single prefix-path compression
technique can be developed for compressing FP-trees or conditional FP-trees that
contain single prefix paths. This will further enhance the performance and reduce the
efforts of redundancy checking at mining closed frequent itemsets and max-patterns.
Search WWH ::




Custom Search