Database Reference
In-Depth Information
these items are not relevant to counting the extensions of P ∪{ i }
. This independent
problem is similar in structure to the original problem, and can be solved recursively.
Although it is natural to use recursion for the depth-first versions of TreeProjection ,
the breadth-first versions are not defined recursively. Nevertheless, the breadth-first
versions explore a pattern space of the same size as the depth-first versions, and are
no different either in terms of the tree size or the counting work done over the en-
tire algorithm. The major challenge in the breadth-first version is in maintaining the
projected transactions along the breadth of the tree, which is storage-intensive. It is
shown in [ 5 ], how many of these issues can be resolved with the use of a combination
of exploration strategies for tree growth and counting. Furthermore, it is also shown
in [ 5 ] how breadth-first and depth-first methods may be combined.
Note that this concept of database projection is common between TreeProjection
and FP-growth although there are some differences in the internal representation of
the projected databases. The aforementioned description is designed for discovering
all patterns, and does not incorporate maximal pattern pruning. When generating
all the itemsets, the main advantage of the depth-first strategy over the breadth-
first strategy is that it is less memory intensive. This is because one does not have
to simultaneously handle the large number of candidates along the breadth of the
enumeration tree at any point in the course of algorithm execution when combined
with counting data structures. The overall size of the candidate space is fixed, and
defined by the size of the enumeration tree. Therefore, over the entire execution of
the algorithm, there is no difference between the two strategies in terms of search
space size, beyond memory optimization.
Projection-based algorithms, such as TreeProjection , can be implemented either
recursively or non-recursively. Depth-first variations of projection strategies, such
as DepthProject and FP-growth , are generally implemented recursively in which
a particular prefix (or suffix) of frequent items is grown recursively (see Fig. 2.6 ).
For recursive variations, the structure and size of the recursion tree is the same as
the enumeration tree. Non-recursive variations of TreeProjection methods directly
present the projection-based algorithms in terms of the enumeration tree by storing
projected transactions at the nodes in the enumeration tree. Describing projection
strategies directly in terms of the enumeration tree is helpful, because one can use
the enumeration tree explicitly to optimize the projection. For example, one does
not need to project at every node of the enumeration tree, but project only when
the size of the database reduces by a particular factor with respect to the nearest
ancestor node where the last projection was stored. Such optimizations can reduce
the space-overhead of repeated elements in the projected databases at different levels
of the enumeration (recursion) tree. It has been shown how to use this optimization
in different variations of TreeProjection . Furthermore, breadth-first variations of the
strategy are naturally defined non-recursively in terms of the enumeration tree. The
recursive depth-first versions may be viewed either as divide-and-conquer strategies
(because they recursively solve a set of smaller subproblems), or as projection-based
counting reuse strategies. The notion of projection-based counting reuse clearly
describes how computational savings are achieved in both versions of the algorithm.
Search WWH ::




Custom Search