Frequent Pattern Mining Algorithms: A Survey - Frequent Pattern Mining

Database Reference

In-Depth Information

containing e . The counts on the prefix paths are re-adjusted because many branches

are pruned. The removal of infrequent items and that of the item e might lead to a

conditional FP-Tree that looks very different from the conditional prefix-paths. These

kinds of conditional FP-trees need to be generated for each conditional frequent item,

although only a single item has been shown for the sake of simplicity. Note that, in

general, the pointers may need to be recreated every time a conditional FP-Tree is

created.

4.2

Variations

As the database grows larger, the construction of the FP-Tree become challenging

both from runtime and space complexity. There have been many works [ 8 , 24 , 27 , 29 ,

30 , 36 , 39 , 55 , 59 , 61 , 62 ] to tackle these challenges. These variations of FP-growth

method can be classified into two categories. Methods belonging to the first category

design memory-based mining process using a memory-resident data structure that

holds partitioned database. Methods belonging to the second category improve the

efficiency of the FP-Tree representation. In this subsection, we will present these

approaches briefly.

4.2.1

Memory-Resident Variations

In the following, a number of different memory-resident variations of the basic

FP-growth idea will be described.

CT-PRO Algorithm In this work [ 62 ], the authors introduced a new FP-Tree like

data structure called Compact FP-Tree (CFP-Tree) that holds the same information

as FP-Tree but with 50 % less storage. They also designed a mining algorithm called

CT-PRO which follows a non-recursive procedure unlike FP-growth . As discussed

earlier, during the mining process, FP-growth constructs many conditional FP-Trees,

which becomes an overhead as the patterns get longer or the support gets lower. To

overcome this problem, the CT-PRO algorithm divides the database into several

disjoint projections where each projection is represented as a CFP-Tree. Then a non-

recursive mining process is executed over each projection independently. Significant

modifications were made to the header Table 4.1 data structure. In the original FP-

Tree, the nodes store the support and item label. However, in the CFP-Tree, item

labels are mapped to an increasing sequence of integers that is actually the index of

the header table. The header table of CFP-Tree stores the support of each item. To

compress the original FP-Tree, all identical subtrees are removed by accumulating

them and storing the relevant information in the leftmost branch. The header table

contains a pointer to each node on the leftmost branch of the CFP-Tree, as these

nodes are roots of subtrees starting with different items.

The mining process starts from the pointers of the least frequent items in the header

table. This prunes a large number of nodes at an early stage and shrinks the tree

Search WWH ::

Custom Search

Home