Database Reference
In-Depth Information
Table 3.6 Projected databases and sequential patterns
Prefix
Projected (postfix) database
Sequential patterns
a
( abc )( ac ) d ( cf )
,
(_ d ) c ( bc )( ae )
,
a
,
aa
,
ab
,
a ( bc )
,
a ( bc ) a
,
aba
,
(_ b )( df ) cb
,
(_ f ) cbc
abc
,
( ab )
,
( ab ) c
,
( ab ) d
,
( ab ) f
,
( ab ) dc
,
ac
,
aca
,
acb
,
acc
,
ad
,
adc
,
af
b
(_ c )( ac ) d ( cf )
,
(_ c )( ae )
,
b
,
ba
,
bc
,
( bc )
,
( bc ) a
,
bd
,
bdc
,
( df ) cb
,
c
bf
c
( ac ) d ( cf )
,
( bc )( ae )
,
b
,
bc
c
,
ca
,
cb
,
cc
d
( cf )
,
c ( bc )( ae )
,
(_ f ) cb
d
,
db
,
dc
,
dcb
e
(_ f )( ab )( df ) cb
,
( af ) cbc
e
,
ea
,
eab
,
eac
,
eacb
,
eb
,
ebc
,
ec
,
ecb
,
ef
,
ef b
,
ef c
,
ef c b
.
f
( ab )( df ) cb
,
cbc
f
,
fb
,
fbc
,
fc
,
fcb
-projected
database. It is the collection that contains only those subsequences prefixed with the
first occurrence of
The sequential patterns with prefix
a
are mined in the (prefix)
a
a
. For example, in sequence
( ef )( ab )( df ) cb
, only the subse-
quence
should count. Notice that (_ b ) means that the last element in the
prefix, which is a , together with b , form one element (i.e., occurring together). Thus
the
(_ b )( df ) cb
a
-projected database consists of four postfix sequences:
( abc )( ac ) d ( cf )
,
(_ d ) c ( bc )( ae )
,
(_ b )( df ) cb
and
(_ f ) cbc
. By scanning
a
-projected database
once, all the length-2 sequential patterns prefixed with
a
can be found. They are:
(
aa
: 2), (
ab
: 4), (
( ab )
: 2), (
ac
: 4), (
ad
: 2), and (
af
: 2).
Recursively, all sequential patterns with prefix
a
can be partitioned into six
subsets: (1) that prefixed with
aa
, (2) that with
ab
, ... , and finally, (6) that with
. These subsets can be mined by constructing respective projected databases and
mining each recursively.
For example, the
af
-projected database consists of only one non-empty (post-
fix) subsequences prefixed with
aa
. Since there is no hope to
generate any frequent subsequence from a single sequence, the processing of
aa
:
(_ bc )( ac ) d ( cf )
aa
-
projected database terminates. Similarly, the
ab
-projected database consists of
three postfix sequences:
(_ c )( ac ) d ( cf )
,
(_ c ) a
, and
c
. Recursively mining it
returns four sequential patterns:
(_ c )
,
(_ c ) a
,
a
, and
c
(i.e.,
a ( bc )
,
a ( bc ) a
,
aba
.)
Using the same method, sequential patterns with prefix
, and
abc
,
can be mined from the corresponding projected databases respectively. The projected
databases as well as the sequential patterns found are shown in Table 3.6 .
The example shows that PrefixSpan examines only the prefix subsequences and
projects only their corresponding postfix subsequences into projected databases, and
in each projected database, sequential patterns are grown by exploring only local
frequent patterns.
To further improve mining efficiency, two kinds of optimizations are explored
[ 30 ]: (1) pseudo-projection , and (2) bi-level projection . Pseudo-projection is based
on the following idea: When the database can be held in main memory, instead of
constructing a physical projection by collecting all the postfixes, one can use pointers
b
,
c
,
d
,
e
and
f
 
Search WWH ::




Custom Search