Database Reference
In-Depth Information
Mining sequential patterns containing only item a. The
a
-projected database
is
. By mining this projected database, only one additional
sequential pattern containing only item a , i.e.,
{ aaa
,
aa
,
a
,
a }
: 2, is found.
Mining sequential patterns containing item b but no item after b in the f_list. By
mining the
aa
, four additional
sequential patterns containing item b but no item after b in f_list are found. They are
{
b
-projected database:
{
a ( ab ) a
,
aba
,
( ab ) b
,
ab
}
.
Mining sequential patterns containing item c but no item after c in the f_list. The
mining of the
ab
:4,
ba
:2,
( ab )
:2,
aba
:2
}
},
proceeds as follows. One scan of the projected database generates the set of length-2
frequent sequences, which are {
c
-projected database: {
a ( abc )( ac ) c
,
ac ( bc ) a
,
( ab ) cb
,
acbc
ac
:4,
( bc )
:2,
bc
:3,
cc
:3,
ca
:2,
cb
:
3
}
. One additional scan of the
c
-projected database generates all of its projected
databases. The mining of the
ac
-projected database:
{
a ( abc )( ac ) c
,
ac ( bc ) a
,
( ab ) cb
,
acbc
}
generates the set of length-3 patterns as follows:
{
acb
:3,
acc
:
3,
. Four projected database will be generated from them. The
mining of the first one, the
( ab ) c
:2,
aca
:2
}
}
generates no length-4 pattern. The mining along this line terminates. Similarly, we
can show that the mining of the other three projected databases terminates without
generating any length-4 patterns for the
acb
-projected database:
{
ac ( bc ) a
,
( ab ) cb
,
acbc
-projected database.
Mining other subsets of sequential patterns. Other subsets of sequential patterns
can be mined similarly on their corresponding projected databases. This mining
process proceeds recursively, which derives the complete set of sequential patterns.
The detailed presentation of the FreeSpan algorithm, the proof of its completeness
and correctness, and the performance study of the algorithm are in [ 15 ].
ac
4.2
PrefixSpan
PrefixSpan [ 16 ] builds upon the concept of FreeSpan but instead of projecting se-
quence databases it examines only the prefix subsequences and projects only their
corresponding suffix subsequences into projected databases. This way, sequential
patterns are grown in each projected database by exploring only local frequent se-
quences. For a sequence s
=
a ( abc )( ac ) d ( cf )
,
a
,
aa
,
a ( ab )
and
a ( abc )
are prefixes of sequence s , but neither
ab
nor
a ( bc )
is considered as a prefix
if every item in the prefix
a ( abc )
of sequence s is frequent in database D . Also,
( abc )( ac ) d ( cf )
is the suffix w.r.t. the prefix
a
,
(_ bc )( ac ) d ( cf )
is the suffix w.r.t.
the prefix
.
The problem of mining sequential patterns can be decomposed into a set of
subproblems. Let
aa
, and
(_ c )( ac ) d ( cf )
is the suffix w.r.t. the prefix
a ( ab )
be the complete set of length-1 sequential
patterns in a sequence database D . The complete set of sequential patterns in D can
be divided into n disjoint subsets. The i -th subset (1
{
x 1
,
x 2
,
···
,
x n }
i
n ) is the set of sequential
{
x i }
{
···
, β m }
patterns with prefix
. Let α be a length- l sequential pattern and
β 1 , β 2 ,
be the set of all length-( l
1) sequential patterns with prefix α . The complete set of
sequential patterns with prefix α , except for α itself, can be divided into m disjoint
+
Search WWH ::




Custom Search