Database Reference
In-Depth Information
Table 3.2 Conditional
(sub)-databases and
conditional FP-trees of
frequent 1-itemsets
Item
Conditional sub-database
Conditional FP-tree
p
{( fcam : 2), ( cb : 1)}
{
( c :3)
}|
p
m
{( fca : 2), ( fcab : 1)}
{
( fca :3)
}|
m
b
{( fca : 1), ( f : 1), ( c : 1)}
a
{( fc : 3)}
{
( fc :3)
}|
a
c
{( f : 3)}
{
( f :3)
}|
c
f
the ones having item m but no p ; ... ; and (6) the one having only item f . FP-growth
finds these subsets of frequent itemsets as follows.
Based on node-link property, we collect all the transactions that p participates by
starting from p 's head (in the header table) and following p 's node-links.
Item p derives a frequent itemset ( p : 3) and two paths in the FP-tree:
f :
4, c :3, a :3, m :2, p :2
. The first path indicates that
string “( f , c , a , m , p )” appears twice in the database. Notice although string
and
c :1, b :1, p :1
f , c , a
appears three times and
itself appears even four times, they only appear twice
together with p . Thus to study which strings appear together with p , only p 's prefix
path
f
counts. Similarly, the second path indicates string “( c , b , p )” appears
once in the set of transactions in TDB ,or p 's prefix path is
fcam :2
. These two prefix
paths of p , “{( fcam : 2), ( cb : 1)}”, form p 's sub-database, which is called p 's
conditional database (i.e., the sub-database under the condition of p 's existence).
Construction of an FP-tree on this conditional sub-database (which is called p 's
conditional FP-tree) leads to only one branch ( c : 3). Hence only one frequent
itemset ( cp : 3) is derived. The search for frequent itemsets having p terminates.
For item m , it derives a frequent itemset ( m : 3) and two paths
cb :1
f :4, c :3, a :
3, m :2
. Notice p appears together with m as
well, however, there is no need to include p here in the analysis since any frequent
itemsets involving p has been analyzed in the previous examination of p . Similar
to the above analysis, m 's conditional sub-database is, {( fca : 2), ( fcab : 1)}.
Constructing an FP-tree on it, we derive m 's conditional FP-tree,
and
f :4, c :3, a :3, b :1, m :1
fca :3
, a single
frequent itemset path.
Since m 's conditional FP-tree,
, has a single branch, instead of re-
cursively constructing its conditional FP-trees, one can simply enumerate all the
combinations of its components, i.e., {( a : 3), ( c : 3), ( f : 3), ( ca : 3), ( fa : 3),
( fca : 3), ( fc : 3)}. Such simple pattern enumeration for single-path FP-trees has
been proven truly useful at reducing mining efforts.
Similarly, the remaining frequent itemsets can be mined by constructing corre-
sponding conditional sub-databases and perform mining on them, respectively. The
conditional sub-databases and the conditional FP-trees generated are summarized in
Table 3.2 .
When the database is too big to make its FP-tree fit in memory, the database can
be projected into its conditional sub-databases (without constructing disk-based FP-
trees). Two methods can be used for the projection of a database into its conditional
sub-databases: parallel projection and partition projection . The former projects each
fca :3
 
Search WWH ::




Custom Search