Databases Reference
In-Depth Information
Table 6.2 Mining the FP-Tree by Creating Conditional (Sub-)Pattern Bases
ItemConditionalPatternBase ConditionalFP-tree FrequentPatternsGenerated
I5
ffI2, I1: 1g, fI2, I1, I3: 1gg hI2: 2, I1: 2i
fI2, I5: 2g, fI1, I5: 2g, fI2, I1, I5: 2g
I4 ffI2, I1: 1g, fI2: 1gg
hI2: 2i
fI2, I4: 2g
I3
ffI2, I1: 2g, fI2: 2g, fI1: 2gg hI2: 4, I1: 2i, hI1: 2i fI2, I3: 4g, fI1, I3: 4g, fI2, I1, I3: 2g
I1
ffI2: 4gg
hI2: 4i
fI2, I1: 4g
Support
count
null{}
Item ID
Node-link
I1:2
I2
4
I2:4
I1
4
I1:2
Figure 6.8 The conditional FP-tree associated with the conditional node I3.
Similar to the preceding analysis, I3's conditional pattern base is ffI2, I1: 2g, fI2: 2g,
fI1: 2gg. Its conditional FP-tree has two branches, hI2: 4, I1: 2i and hI1: 2i, as shown
in Figure 6.8, which generates the set of patterns ffI2, I3: 4g, fI1, I3: 4g, fI2, I1, I3: 2gg.
Finally, I1's conditional pattern base is ff I2: 4gg, with an FP-tree that contains only one
node, hI2: 4i, which generates one frequent pattern, fI2, I1: 4g. This mining process is
summarized in Figure 6.9.
The FP-growth method transforms the problem of finding long frequent patterns
into searching for shorter ones in much smaller conditional databases recursively and
then concatenating the suffix. It uses the least frequent items as a suffix, offering good
selectivity. The method substantially reduces the search costs.
When the database is large, it is sometimes unrealistic to construct a main memory-
based FP-tree. An interesting alternative is to first partition the database into a set
of projected databases, and then construct an FP-tree and mine it in each projected
database. This process can be recursively applied to any projected database if its FP-tree
still cannot fit in main memory.
A study of the FP-growth method performance shows that it is efficient and scalable
for mining both long and short frequent patterns, and is about an order of magnitude
faster than the Apriori algorithm.
6.2.5 Mining Frequent Itemsets Using the Vertical Data Format
Both the Apriori and FP-growth methods mine frequent patterns from a set of trans-
actions in TID-itemset format (i.e., f TID : itemset g), where TID is a transaction ID
and itemset is the set of items bought in transaction TID . This is known as the
horizontal data format . Alternatively, data can be presented in item-TID set format
 
Search WWH ::




Custom Search