Frequent Pattern Mining Algorithms: A Survey - Frequent Pattern Mining

Database Reference

In-Depth Information

these items are not relevant to counting the extensions of P ∪{ i }

. This independent

problem is similar in structure to the original problem, and can be solved recursively.

Although it is natural to use recursion for the depth-first versions of TreeProjection ,

the breadth-first versions are not defined recursively. Nevertheless, the breadth-first

versions explore a pattern space of the same size as the depth-first versions, and are

no different either in terms of the tree size or the counting work done over the en-

tire algorithm. The major challenge in the breadth-first version is in maintaining the

projected transactions along the breadth of the tree, which is storage-intensive. It is

shown in [ 5 ], how many of these issues can be resolved with the use of a combination

of exploration strategies for tree growth and counting. Furthermore, it is also shown

in [ 5 ] how breadth-first and depth-first methods may be combined.

Note that this concept of database projection is common between TreeProjection

and FP-growth although there are some differences in the internal representation of

the projected databases. The aforementioned description is designed for discovering

all patterns, and does not incorporate maximal pattern pruning. When generating

all the itemsets, the main advantage of the depth-first strategy over the breadth-

first strategy is that it is less memory intensive. This is because one does not have

to simultaneously handle the large number of candidates along the breadth of the

enumeration tree at any point in the course of algorithm execution when combined

with counting data structures. The overall size of the candidate space is fixed, and

defined by the size of the enumeration tree. Therefore, over the entire execution of

the algorithm, there is no difference between the two strategies in terms of search

space size, beyond memory optimization.

Projection-based algorithms, such as TreeProjection , can be implemented either

recursively or non-recursively. Depth-first variations of projection strategies, such

as DepthProject and FP-growth , are generally implemented recursively in which

a particular prefix (or suffix) of frequent items is grown recursively (see Fig. 2.6 ).

For recursive variations, the structure and size of the recursion tree is the same as

the enumeration tree. Non-recursive variations of TreeProjection methods directly

present the projection-based algorithms in terms of the enumeration tree by storing

projected transactions at the nodes in the enumeration tree. Describing projection

strategies directly in terms of the enumeration tree is helpful, because one can use

the enumeration tree explicitly to optimize the projection. For example, one does

not need to project at every node of the enumeration tree, but project only when

the size of the database reduces by a particular factor with respect to the nearest

ancestor node where the last projection was stored. Such optimizations can reduce

the space-overhead of repeated elements in the projected databases at different levels

of the enumeration (recursion) tree. It has been shown how to use this optimization

in different variations of TreeProjection . Furthermore, breadth-first variations of the

strategy are naturally defined non-recursively in terms of the enumeration tree. The

recursive depth-first versions may be viewed either as divide-and-conquer strategies

(because they recursively solve a set of smaller subproblems), or as projection-based

counting reuse strategies. The notion of projection-based counting reuse clearly

describes how computational savings are achieved in both versions of the algorithm.

Frequent Pattern Mining

Search WWH ::

Custom Search

Home