Mining Frequent Patterns, Associations, and Correlations: Basic Concepts and Methods - Data Mining: Concepts and Techniques - page 259

Databases Reference

In-Depth Information

Table 6.2 Mining the FP-Tree by Creating Conditional (Sub-)Pattern Bases

ItemConditionalPatternBase ConditionalFP-tree FrequentPatternsGenerated

I5

ffI2, I1: 1g, fI2, I1, I3: 1gg hI2: 2, I1: 2i

fI2, I5: 2g, fI1, I5: 2g, fI2, I1, I5: 2g

I4 ffI2, I1: 1g, fI2: 1gg

hI2: 2i

fI2, I4: 2g

I3

ffI2, I1: 2g, fI2: 2g, fI1: 2gg hI2: 4, I1: 2i, hI1: 2i fI2, I3: 4g, fI1, I3: 4g, fI2, I1, I3: 2g

I1

ffI2: 4gg

hI2: 4i

fI2, I1: 4g

Support

count

null{}

Item ID

Node-link

I1:2

I2

4

I2:4

I1

4

I1:2

Figure 6.8 The conditional FP-tree associated with the conditional node I3.

Similar to the preceding analysis, I3's conditional pattern base is ffI2, I1: 2g, fI2: 2g,

fI1: 2gg. Its conditional FP-tree has two branches, hI2: 4, I1: 2i and hI1: 2i, as shown

in Figure 6.8, which generates the set of patterns ffI2, I3: 4g, fI1, I3: 4g, fI2, I1, I3: 2gg.

Finally, I1's conditional pattern base is ff I2: 4gg, with an FP-tree that contains only one

node, hI2: 4i, which generates one frequent pattern, fI2, I1: 4g. This mining process is

summarized in Figure 6.9.

The FP-growth method transforms the problem of finding long frequent patterns

into searching for shorter ones in much smaller conditional databases recursively and

then concatenating the suffix. It uses the least frequent items as a suffix, offering good

selectivity. The method substantially reduces the search costs.

When the database is large, it is sometimes unrealistic to construct a main memory-

based FP-tree. An interesting alternative is to first partition the database into a set

of projected databases, and then construct an FP-tree and mine it in each projected

database. This process can be recursively applied to any projected database if its FP-tree

still cannot fit in main memory.

A study of the FP-growth method performance shows that it is efficient and scalable

for mining both long and short frequent patterns, and is about an order of magnitude

faster than the Apriori algorithm.

6.2.5 Mining Frequent Itemsets Using the Vertical Data Format

Both the Apriori and FP-growth methods mine frequent patterns from a set of trans-

actions in TID-itemset format (i.e., f TID : itemset g), where TID is a transaction ID

and itemset is the set of items bought in transaction TID . This is known as the

horizontal data format . Alternatively, data can be presented in item-TID set format

Next Page

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home