Database Reference
In-Depth Information
3.2
TreeProjection Algorithms
Two variants of an algorithm which use recursive projections of the transactions
down the lexicographic tree structure are proposed in [ 5 ] and [ 4 ], respectively. The
goal of using these recursive projections is to reuse the counting work down at a given
level for lower levels of the tree. This reduces the counting work at the lower levels
by orders of magnitude, as long as it is possible to successfully manage the memory
requirements of the projected transactions. The main difference between the different
versions of TreeProjection is the exploration strategy used. TreeProjection can be
viewed as a generic framework that advocates the notion of database projection, in
the context of several different strategies for constructing the enumeration tree, such
as a breadth-first, depth-first, or a combination of the two. The depth-first version,
described in detail in [ 4 ], also incorporates maximal pruning, though the disabling of
the pruning options can also materialize all the patterns. The breadth-first and depth-
first algorithms have different advantages. The former allows level-wise pruning
which is not possible in depth-first methods though it is often not used in projection-
based methods. The depth-first version allows better memory management. The
depth-first approach works best when the itemsets are very long, and it is desirable
to quickly discover maximal patterns, so that portions of the lexicographic tree can
be pruned off quickly during exploration and it can also be used for discovering
all patterns including non-maximal ones. When all patterns are required, including
non-maximal ones, the primary difference between different strategies is not one
of the size of the candidate space, but that of effective memory management of the
projected transactions. This is because the size of the candidate space is defined by
the size of the enumeration tree, which is fixed, and is agnostic to the strategy used for
tree exploration. On the other hand, memory management of projected transactions
is easier with the depth-first strategy because one only needs to maintain a small
number of projected transaction sets along the depth of the tree. The notion of
database projection is common to TreeProjection and FP-growth , and helps reduce
the counting work by restricting the size of the database used for support counting.
TreeProjection was developed independently from FP-growth . While the FP-growth
paper provides a brief discussion of TreeProjection , this chapter will provide a more
detailed discussion of the similarities and differences between the two methods. One
major difference between the two methods is that the internal representation of the
corresponding projected databases is different in the two cases.
The basic database projection approach is very similar in both cases of TreeProjec-
tion and FP-growth . An important observation is that if a transaction is not relevant
for counting at a given node in the enumeration tree, then it will not be relevant
for counting in any descendent of that node. Therefore, only those transactions are
retained that contain all items in P for counting at the node P in the projected trans-
actions. Note that this set strictly reduces as we move to lower levels of the tree, and
the set of relevant transactions at the lower level of the enumeration tree is a subset of
the set at a higher level. Furthermore, only the presence of items corresponding to the
candidate extensions of a node are relevant for counting at any of the subtrees rooted
Search WWH ::




Custom Search