Database Reference
In-Depth Information
structure. By following the pointers to the same item, a projection of all transactions
ending with the corresponding item is built. This projection is also represented as a
CFP-Tree called local CFP-Tree. The local CFP-Tree is then traversed to extract the
frequent patterns in the projection.
H-Mine Algorithm The authors in [ 54 ] proposed an efficient algorithm called H-
Mine . It uses a memory efficient hyper-structure called H-Struct. The fundamental
strategy of H-Mine is to partition the database and mine each partition in the memory.
Finally, the results from different partitions are consolidated into global frequent
patterns. An intelligent module of H-Mine is that it can identify whether the database
is dense or sparse, and it is able to make dynamic choices between different data
structures based on this identification. More details may be found in Chap. 3 on
pattern-growth methods.
4.2.2
Improved Data Structure Variations
In this section, several variations of the basic algorithm by improving the underlying
data structure will be described.
Using Arrays A significant part of the mining time in FP-growth is spent on travers-
ing the tree. To reduce this time, the authors in [ 29 ] designed an array based
implementation of FP-growth , named FP-growth * which drastically reduces the
traversal time of the mining algorithm. It uses the FP-Tree data structure in com-
bination with an array-like data structure and it incorporates various optimization
schemes. It should be pointed out that the TreeProjection family of algorithms also
uses arrays, though the optimizations used are quite different.
When the input database is sparse, the array based technique performs well be-
cause the array saves the traversal time for all the items; moreover the initialization
of the next level of FP-Trees is easier using an array. But in case of dense database,
the tree base representation is more compact. To deal with the situation, FP-growth*
devises a mechanism to identify whether the database is sparse or not. To do so, FP-
growth* counts the number of nodes in each level of the tree. Based on experiments,
they found that if the upper quarter of the tree contains less than 15 % of the total
number of nodes, then the database is most likely dense. Otherwise, it is sparse. If
the database turns out to be sparse, FP-growth* allocates an array for each FP-Tree
in the next level of mining.
The nonordfp Approach This work [ 55 ] presented an improved implementation of
the well known FP-growth algorithm using an efficient FP-Tree like data structure
that allows faster allocation, traversal and optional projection. The tree nodes do
not store their labels (item identifiers). There is no concept of header table. The
data structure stores less administrative information in the tree node which allow the
recursive step of mining without rebuilding the tree.
Search WWH ::




Custom Search