Database Reference
In-Depth Information
￿
Various database projection methods are different in terms of the specific data
structures used for the projected database. The different variations of TreeProjec-
tion use arrays and bit strings to represent the projected database. The FP-growth
method uses an FP-Tree. The FP-Tree will be discussed in the next section. Later
variations of FP-Tree also use combinations of arrays and pointers to represent
the projected database. Some variations, such as OpportuneProject [ 38 ], combine
different data structures in an optimized way to obtain the best result.
￿
Suffix-based recursive growth is inherently defined as a depth-first strategy. On
the other hand, as is evident from the discussion in [ 5 ], the specific choice of ex-
ploration strategy on the enumeration tree is orthogonal to the process of database
projection. The overall size of the enumeration tree is the same, no matter how it is
explored, unless maximal pattern pruning is used. Thus, TreeProjection explores
a variety of strategies such as breadth-first and depth-first strategies, with no dif-
ference to the (overall) work required for counting. The major challenge with the
breadth-first strategy is the simultaneous maintenance of projected transaction
sets along the breadth of the tree. The issue of effective memory management of
breadth-first strategies is discussed in [ 5 ], which shows how certain optimizations
such as cache-blocking can improve the effectiveness in this case. Breadth-first
strategies also allow certain kinds of pruning such as level-wise pruning.
￿
The major advantages of depth-first strategies arise in the context of maximal pat-
tern mining. This is because a depth-first strategy discovers the maximal patterns
very early, which can be used to prune the smaller non-maximal patterns. In this
case, the size of the search space explored truly reduces because of a depth-first
strategy. This issue is discussed in the section on maximal pattern mining. The
advantages for maximal pattern mining were first proposed in the context of the
DepthProject algorithm [ 4 ].
Next, we will describe the FP-Tree data structure that uses compressed representa-
tions of the transaction database for more efficient counting.
4.1
The FP-Growth Approach
The FP-growth approach combines suffix-based pattern exploration with a com-
pressed representation of the projected database for more efficient counting. The
prefix-based FP-Tree is a compressed representation of the database which is built
by considering a fixed order among the items in an itemset [ 32 ]. This tree is used to
represent the conditional transaction sets
T i of Fig. 2.9 . An FP-Tree may be
viewed as a prefix-based trie data structure of the transaction database of frequent
items. Just as each node in a trie is labeled with a symbol, a node in the FP-Tree is
labeled with an item. In addition, the node holds the support of the itemset defined
by the items of the nodes that are on the path from the root to u . By consolidating the
prefixes, one obtains compression. This is useful for effective memory management.
On the other hand, the maintenance of counts and pointers with the prefixes is an
T
and
Search WWH ::




Custom Search