Database Reference
In-Depth Information
a
b
c
root
root
root
Item
Item
Item
1
1
1
1:3
1:3
1:3
2:1
2
2
3
2:1
3:1
2:1
3:1
Fig. 9.4 Prefix path subtrees ending in 3 ( a ), 2 ( b ), and 1 ( c )
In the worst case, we will have the whole database, if there are no items in
common in the transactions, which is very unlikely in real-world scenarios.
To discover the frequent itemsets, the method makes use of a divide and
conquer strategy. The FP-tree is divided using a bottom-up algorithm that
finds subtrees such that each one of them is composed of the paths ending
at the same item. These are called prefix path subtrees. There are as many
subtrees as frequent items. Each subtree is built bottom-up starting from
a frequent item. For instance, for the FP-tree in Fig. 9.3 , the prefix path
subtree ending in 3 is shown in Fig. 9.4 a. The subtree is composed of all
the paths that start at the root and end at a node representing item 3 .We
proceed analogously with items 2 and 1 , yielding the subtrees of Fig. 9.4 b, c.
Note that the support of the items in each subtree is obtained by adding the
values of the counters along each linked list (the one formed by the dashed
lines). For example, in Fig. 9.4 a, the support of item 3 is 2, because two nodes
3:1 are linked by a dashed line. Thus, item 3 is frequent, and we can now find
the frequent itemsets ending in 3 . For this, we need to build the corresponding
conditional subtrees as we explain next.
For each subtree corresponding to an item i ,weconstructa conditional
subtree as follows: we take each prefix path subtree for i and remove all the
nodes containing i . For example, for the subtree corresponding to item 3 ,we
traverse the tree bottom-up starting from each node for 3 . We drop all the
nodes for item 3 (in this case, the two nodes 3:1 ) and update the counters.
Now, the node 1:3 becomes 1:2 (corresponding to the paths root
1:2
2:1 ,
and root
2:1 ). We obtain the tree in Fig. 9.5 a. From this tree we remove the
infrequent nodes, in this case, node 2:1 . The name 'conditional' arises from
the fact that this tree contains only items that appear together with item 3 .
Frequent itemsets containing the item
{
3
}
are obtained combining the nodes
 
Search WWH ::




Custom Search