Database Reference
In-Depth Information
containing
e
. The counts on the prefix paths are re-adjusted because many branches
are pruned. The removal of infrequent items and that of the item
e
might lead to a
conditional FP-Tree that looks very different from the conditional prefix-paths. These
kinds of conditional FP-trees need to be generated for each conditional frequent item,
although only a single item has been shown for the sake of simplicity. Note that, in
general, the pointers may need to be recreated every time a conditional FP-Tree is
created.
4.2
Variations
As the database grows larger, the construction of the FP-Tree become challenging
both from runtime and space complexity. There have been many works [
8
,
24
,
27
,
29
,
30
,
36
,
39
,
55
,
59
,
61
,
62
] to tackle these challenges. These variations of
FP-growth
method can be classified into two categories. Methods belonging to the first category
design memory-based mining process using a memory-resident data structure that
holds partitioned database. Methods belonging to the second category improve the
efficiency of the FP-Tree representation. In this subsection, we will present these
approaches briefly.
4.2.1
Memory-Resident Variations
In the following, a number of different memory-resident variations of the basic
FP-growth
idea will be described.
CT-PRO Algorithm
In this work [
62
], the authors introduced a new FP-Tree like
data structure called Compact FP-Tree (CFP-Tree) that holds the same information
as FP-Tree but with 50 % less storage. They also designed a mining algorithm called
CT-PRO which follows a non-recursive procedure unlike
FP-growth
. As discussed
earlier, during the mining process,
FP-growth
constructs many conditional FP-Trees,
which becomes an overhead as the patterns get longer or the support gets lower. To
overcome this problem, the
CT-PRO
algorithm divides the database into several
disjoint projections where each projection is represented as a CFP-Tree. Then a non-
recursive mining process is executed over each projection independently. Significant
modifications were made to the header Table 4.1 data structure. In the original FP-
Tree, the nodes store the support and item label. However, in the CFP-Tree, item
labels are mapped to an increasing sequence of integers that is actually the index of
the header table. The header table of CFP-Tree stores the support of each item. To
compress the original FP-Tree, all identical subtrees are removed by accumulating
them and storing the relevant information in the leftmost branch. The header table
contains a pointer to each node on the leftmost branch of the CFP-Tree, as these
nodes are roots of subtrees starting with different items.
The mining process starts from the pointers of the least frequent items in the header
table. This prunes a large number of nodes at an early stage and shrinks the tree