Frequent Pattern Mining Algorithms: A Survey - Frequent Pattern Mining

Database Reference

In-Depth Information

structure. By following the pointers to the same item, a projection of all transactions

ending with the corresponding item is built. This projection is also represented as a

CFP-Tree called local CFP-Tree. The local CFP-Tree is then traversed to extract the

frequent patterns in the projection.

H-Mine Algorithm The authors in [ 54 ] proposed an efficient algorithm called H-

Mine . It uses a memory efficient hyper-structure called H-Struct. The fundamental

strategy of H-Mine is to partition the database and mine each partition in the memory.

Finally, the results from different partitions are consolidated into global frequent

patterns. An intelligent module of H-Mine is that it can identify whether the database

is dense or sparse, and it is able to make dynamic choices between different data

structures based on this identification. More details may be found in Chap. 3 on

pattern-growth methods.

4.2.2

Improved Data Structure Variations

In this section, several variations of the basic algorithm by improving the underlying

data structure will be described.

Using Arrays A significant part of the mining time in FP-growth is spent on travers-

ing the tree. To reduce this time, the authors in [ 29 ] designed an array based

implementation of FP-growth , named FP-growth * which drastically reduces the

traversal time of the mining algorithm. It uses the FP-Tree data structure in com-

bination with an array-like data structure and it incorporates various optimization

schemes. It should be pointed out that the TreeProjection family of algorithms also

uses arrays, though the optimizations used are quite different.

When the input database is sparse, the array based technique performs well be-

cause the array saves the traversal time for all the items; moreover the initialization

of the next level of FP-Trees is easier using an array. But in case of dense database,

the tree base representation is more compact. To deal with the situation, FP-growth*

devises a mechanism to identify whether the database is sparse or not. To do so, FP-

growth* counts the number of nodes in each level of the tree. Based on experiments,

they found that if the upper quarter of the tree contains less than 15 % of the total

number of nodes, then the database is most likely dense. Otherwise, it is sparse. If

the database turns out to be sparse, FP-growth* allocates an array for each FP-Tree

in the next level of mining.

The nonordfp Approach This work [ 55 ] presented an improved implementation of

the well known FP-growth algorithm using an efficient FP-Tree like data structure

that allows faster allocation, traversal and optional projection. The tree nodes do

not store their labels (item identifiers). There is no concept of header table. The

data structure stores less administrative information in the tree node which allow the

recursive step of mining without rebuilding the tree.

Search WWH ::

Custom Search

Home