Pattern-Growth Methods - Frequent Pattern Mining

Database Reference

In-Depth Information

Table 3.2 Conditional

(sub)-databases and

conditional FP-trees of

frequent 1-itemsets

Item

Conditional sub-database

Conditional FP-tree

{( fcam : 2), ( cb : 1)}

{

( c :3)

{( fca : 2), ( fcab : 1)}

{

( fca :3)

{( fca : 1), ( f : 1), ( c : 1)}

∅

{( fc : 3)}

{

( fc :3)

{( f : 3)}

{

( f :3)

∅

the ones having item m but no p ; ... ; and (6) the one having only item f . FP-growth

finds these subsets of frequent itemsets as follows.

Based on node-link property, we collect all the transactions that p participates by

starting from p 's head (in the header table) and following p 's node-links.

Item p derives a frequent itemset ( p : 3) and two paths in the FP-tree:

f :

4, c :3, a :3, m :2, p :2

. The first path indicates that

string “( f , c , a , m , p )” appears twice in the database. Notice although string

and

c :1, b :1, p :1

f , c , a

appears three times and

itself appears even four times, they only appear twice

together with p . Thus to study which strings appear together with p , only p 's prefix

path

counts. Similarly, the second path indicates string “( c , b , p )” appears

once in the set of transactions in TDB ,or p 's prefix path is

fcam :2

. These two prefix

paths of p , “{( fcam : 2), ( cb : 1)}”, form p 's sub-database, which is called p 's

conditional database (i.e., the sub-database under the condition of p 's existence).

Construction of an FP-tree on this conditional sub-database (which is called p 's

conditional FP-tree) leads to only one branch ( c : 3). Hence only one frequent

itemset ( cp : 3) is derived. The search for frequent itemsets having p terminates.

For item m , it derives a frequent itemset ( m : 3) and two paths

cb :1

f :4, c :3, a :

3, m :2

. Notice p appears together with m as

well, however, there is no need to include p here in the analysis since any frequent

itemsets involving p has been analyzed in the previous examination of p . Similar

to the above analysis, m 's conditional sub-database is, {( fca : 2), ( fcab : 1)}.

Constructing an FP-tree on it, we derive m 's conditional FP-tree,

and

f :4, c :3, a :3, b :1, m :1

fca :3

, a single

frequent itemset path.

Since m 's conditional FP-tree,

, has a single branch, instead of re-

cursively constructing its conditional FP-trees, one can simply enumerate all the

combinations of its components, i.e., {( a : 3), ( c : 3), ( f : 3), ( ca : 3), ( fa : 3),

( fca : 3), ( fc : 3)}. Such simple pattern enumeration for single-path FP-trees has

been proven truly useful at reducing mining efforts.

Similarly, the remaining frequent itemsets can be mined by constructing corre-

sponding conditional sub-databases and perform mining on them, respectively. The

conditional sub-databases and the conditional FP-trees generated are summarized in

Table 3.2 .

When the database is too big to make its FP-tree fit in memory, the database can

be projected into its conditional sub-databases (without constructing disk-based FP-

trees). Two methods can be used for the projection of a database into its conditional

sub-databases: parallel projection and partition projection . The former projects each

fca :3

Frequent Pattern Mining

Search WWH ::

Custom Search

Home