Databases Reference
In-Depth Information
null{}
Support
count
I2:7
Item ID
Node-link
I1:2
I2
I1
I3
I4
I5
7
6
6
2
2
I1:4
I4:1
I3:2
I3:2
I5:1
I4:1
I3:2
I5:1
Figure 6.7 An FP-tree registers compressed, frequent pattern information.
when considering the branch to be added for a transaction, the count of each node along
a common prefix is incremented by 1, and nodes for the items following the prefix are
created and linked accordingly.
To facilitate tree traversal, an item header table is built so that each item points to its
occurrences in the tree via a chain of node-links . The tree obtained after scanning all
the transactions is shown in Figure 6.7 with the associated node-links. In this way, the
problem of mining frequent patterns in databases is transformed into that of mining the
FP-tree.
The FP-tree is mined as follows. Start from each frequent length-1 pattern (as an
initial suffix pattern ), construct its conditional pattern base (a “sub-database,” which
consists of the set of prefix paths in the FP-tree co-occurring with the suffix pattern),
then construct its ( conditional ) FP-tree, and perform mining recursively on the tree. The
pattern growth is achieved by the concatenation of the suffix pattern with the frequent
patterns generated from a conditional FP-tree.
Mining of the FP-tree is summarized in Table 6.2 and detailed as follows. We first
consider I5, which is the last item in L , rather than the first. The reason for starting at
the end of the list will become apparent as we explain the FP-tree mining process. I5
occurs in two FP-tree branches of Figure 6.7. (The occurrences of I5 can easily be found
by following its chain of node-links.) The paths formed by these branches are hI2, I1,
I5: 1i and hI2, I1, I3, I5: 1i. Therefore, considering I5 as a suffix, its corresponding two
prefix paths are hI2, I1: 1i and hI2, I1, I3: 1i, which form its conditional pattern base.
Using this conditional pattern base as a transaction database, we build an I5-conditional
FP-tree, which contains only a single path, hI2: 2, I1: 2i; I3 is not included because its
support count of 1 is less than the minimum support count. The single path generates
all the combinations of frequent patterns: fI2, I5: 2g, fI1, I5: 2g, fI2, I1, I5: 2g.
For I4, its two prefix paths form the conditional pattern base, ffI2 I1: 1g, fI2: 1gg,
which generates a single-node conditional FP-tree, hI2: 2i, and derives one frequent
pattern, fI2, I4: 2g.
 
Search WWH ::




Custom Search