Database Reference
In-Depth Information
Fig. 11.5 Projected databases and sequential patterns for PrefixSpan algorithm. (Adapted from [ 3 ])
subsets. The j -th subset (1
m ) is the set of sequential patterns prefixed with
β j . Based on this observation, the problem can be partitioned recursively. That is,
each subset of sequential patterns can be further divided when necessary. This forms
a divide-and-conquer framework. In the following, let us examine how to use the
prefix-based projection approach for mining sequential patterns based the following
example [ 3 ].
Example 4
j
(PrefixSpan) Given the sequence database D in Table 11.1 and
min _ suppor t
2, sequential patterns in S can be mined by a prefix-projection
method in the following steps.
=
1. Find length-1 sequential patterns. Scan D once to find all the frequent items in
sequences. Each of these frequent items is a length-1 sequential pattern. They
are
a
:4,
b
:4,
c
:4,
d
:3,
e
: 3, and
f
: 3, where the notation
: count ” represents the pattern and its associated support count.
2. Divide search space. The complete set of sequential patterns can be partitioned
into the following six subsets according to the six prefixes: (1) the ones with
prefix
pattern
.
3. Find subsets of sequential patterns. The subsets of sequential patterns can be
mined by constructing the corresponding set of projected databases and mining
each recursively. The projected databases as well as sequential patterns found in
them are listed in Fig. 11.5 , while the mining process is explained as follows.
(a) Find sequential patterns with prefix
a
, (2) the ones with prefix
b
, ... , and (6) the ones with prefix
f
a
. Only the sequences containing
a
should be collected. Moreover, in a sequence containing
a
, only the sub-
sequence prefixed with the first occurrence of
a
should be considered. For
example, in sequence
( ef )( ab )( df ) cb
, only the subsequence
(_ b )( df ) cb
should be considered for mining sequential patterns prefixed with
. Notice
that (_ b ) means that the last element in the prefix, which is a , together with b ,
form one element.
The sequences in D containing
a
a
are projected w.r.t.
a
to form the
a
-
projected database, which consists of four suffix sequences:
( abc )( ac ) d ( cf )
,
(_ d ) c ( bc )( ae )
,
(_ b )( df ) cb
and
(_ f ) cbc
.
 
Search WWH ::




Custom Search