Database Reference
In-Depth Information
containing e . The counts on the prefix paths are re-adjusted because many branches
are pruned. The removal of infrequent items and that of the item e might lead to a
conditional FP-Tree that looks very different from the conditional prefix-paths. These
kinds of conditional FP-trees need to be generated for each conditional frequent item,
although only a single item has been shown for the sake of simplicity. Note that, in
general, the pointers may need to be recreated every time a conditional FP-Tree is
created.
4.2
Variations
As the database grows larger, the construction of the FP-Tree become challenging
both from runtime and space complexity. There have been many works [ 8 , 24 , 27 , 29 ,
30 , 36 , 39 , 55 , 59 , 61 , 62 ] to tackle these challenges. These variations of FP-growth
method can be classified into two categories. Methods belonging to the first category
design memory-based mining process using a memory-resident data structure that
holds partitioned database. Methods belonging to the second category improve the
efficiency of the FP-Tree representation. In this subsection, we will present these
approaches briefly.
4.2.1
Memory-Resident Variations
In the following, a number of different memory-resident variations of the basic
FP-growth idea will be described.
CT-PRO Algorithm In this work [ 62 ], the authors introduced a new FP-Tree like
data structure called Compact FP-Tree (CFP-Tree) that holds the same information
as FP-Tree but with 50 % less storage. They also designed a mining algorithm called
CT-PRO which follows a non-recursive procedure unlike FP-growth . As discussed
earlier, during the mining process, FP-growth constructs many conditional FP-Trees,
which becomes an overhead as the patterns get longer or the support gets lower. To
overcome this problem, the CT-PRO algorithm divides the database into several
disjoint projections where each projection is represented as a CFP-Tree. Then a non-
recursive mining process is executed over each projection independently. Significant
modifications were made to the header Table 4.1 data structure. In the original FP-
Tree, the nodes store the support and item label. However, in the CFP-Tree, item
labels are mapped to an increasing sequence of integers that is actually the index of
the header table. The header table of CFP-Tree stores the support of each item. To
compress the original FP-Tree, all identical subtrees are removed by accumulating
them and storing the relevant information in the leftmost branch. The header table
contains a pointer to each node on the leftmost branch of the CFP-Tree, as these
nodes are roots of subtrees starting with different items.
The mining process starts from the pointers of the least frequent items in the header
table. This prunes a large number of nodes at an early stage and shrinks the tree
Search WWH ::




Custom Search