Database Reference
In-Depth Information
Table 3.2
Conditional
(sub)-databases and
conditional FP-trees of
frequent 1-itemsets
Item
Conditional sub-database
Conditional FP-tree
p
{(
fcam
: 2), (
cb
: 1)}
{
(
c
:3)
}|
p
m
{(
fca
: 2), (
fcab
: 1)}
{
(
fca
:3)
}|
m
b
{(
fca
: 1), (
f
: 1), (
c
: 1)}
∅
a
{(
fc
: 3)}
{
(
fc
:3)
}|
a
c
{(
f
: 3)}
{
(
f
:3)
}|
c
f
∅
∅
the ones having item
m
but no
p
;
...
; and (6) the one having only item
f
. FP-growth
finds these subsets of frequent itemsets as follows.
Based on node-link property, we collect all the transactions that
p
participates by
starting from
p
's head (in the header table) and following
p
's node-links.
Item
p
derives a frequent itemset (
p
: 3) and two paths in the FP-tree:
f
:
4,
c
:3,
a
:3,
m
:2,
p
:2
. The first path indicates that
string “(
f
,
c
,
a
,
m
,
p
)” appears twice in the database. Notice although string
and
c
:1,
b
:1,
p
:1
f
,
c
,
a
appears three times and
itself appears even four times, they only appear twice
together
with
p
. Thus to study which strings appear together with
p
, only
p
's prefix
path
f
counts. Similarly, the second path indicates string “(
c
,
b
,
p
)” appears
once in the set of transactions in
TDB
,or
p
's prefix path is
fcam
:2
. These two prefix
paths of
p
, “{(
fcam
: 2), (
cb
: 1)}”, form
p
's sub-database, which is called
p
's
conditional database
(i.e., the sub-database under the condition of
p
's existence).
Construction of an FP-tree on this conditional sub-database (which is called
p
's
conditional FP-tree) leads to only one branch (
c
: 3). Hence only one frequent
itemset (
cp
: 3) is derived. The search for frequent itemsets having
p
terminates.
For item
m
, it derives a frequent itemset (
m
: 3) and two paths
cb
:1
f
:4,
c
:3,
a
:
3,
m
:2
. Notice
p
appears together with
m
as
well, however, there is no need to include
p
here in the analysis since any frequent
itemsets involving
p
has been analyzed in the previous examination of
p
. Similar
to the above analysis,
m
's conditional sub-database is, {(
fca
: 2), (
fcab
: 1)}.
Constructing an FP-tree on it, we derive
m
's conditional FP-tree,
and
f
:4,
c
:3,
a
:3,
b
:1,
m
:1
fca
:3
, a single
frequent itemset path.
Since
m
's conditional FP-tree,
, has a single branch, instead of re-
cursively constructing its conditional FP-trees, one can simply enumerate all the
combinations of its components, i.e., {(
a
: 3), (
c
: 3), (
f
: 3), (
ca
: 3), (
fa
: 3),
(
fca
: 3), (
fc
: 3)}. Such simple pattern enumeration for single-path FP-trees has
been proven truly useful at reducing mining efforts.
Similarly, the remaining frequent itemsets can be mined by constructing corre-
sponding conditional sub-databases and perform mining on them, respectively. The
conditional sub-databases and the conditional FP-trees generated are summarized in
Table
3.2
.
When the database is too big to make its FP-tree fit in memory, the database can
be projected into its conditional sub-databases (without constructing disk-based FP-
trees). Two methods can be used for the projection of a database into its conditional
sub-databases:
parallel projection
and
partition projection
. The former projects each
fca
:3