Database Reference
In-Depth Information
that are not frequent (the ones with support less than 2 in our example),
and, for each transaction, we sort the frequent items in decreasing order
of their support. In this way, the common prefixes can be shared between
transactions.
In our example, only items 1 , 2 ,and 3 are frequent. The first transaction
in the database includes all of these items. Thus, we sort them in descending
order of support, including the support count in the form i : c . We obtain the
list
, meaning that items 1 , 2 ,and 3 have support 3, 2, and 2,
respectively. We proceed analogously with the other transactions.
In the second pass , we construct the FP-tree as follows. We first create the
root node and then perform a scan of the database, analyzing each transaction
again, but only considering the frequent items obtained in the first pass.
In our example, the first transaction produces the first branch of the tree:
root
1:3, 2:2, 3:2
3:1 . The branch is ordered according to the third column
of the table above, that is, the order defined in the first pass. The second
transaction contains items 1 and 3 , so it shares a common prefix with the
first transaction (i.e., item 1 ). Thus, we increment the support count of this
prefix to 2, and the branch will read root
1:1
2:1
3:1 . The third transaction
includes only one item whose minimum support is at least 2 (again, item 1 ).
Thus, we just increase the node count, yielding the node 1:3 . Finally, the
fourth transaction leads to a new branch of length 1, namely, root
1:2
2:1 .
Figure 9.3 shows the final state of the FP-tree.
root
Item
1
1:3
2:1
2
3
2:1
3:1
3:1
Fig. 9.3 The FP-tree for the running example
Note that this tree represents, in a compact form, the part of the database
containing only items satisfying the minimum support. Each transaction is
mapped to a path in the FP-tree, and the more paths that overlap, the higher
the compression rate achieved. In this way, it is even possible that this tree
fits in main memory. In the best case, we will have a single path in the tree.
 
Search WWH ::




Custom Search