Sequential Pattern Mining - Frequent Pattern Mining - page 268

Database Reference

In-Depth Information

Fig. 11.5 Projected databases and sequential patterns for PrefixSpan algorithm. (Adapted from [ 3 ])

subsets. The j -th subset (1

m ) is the set of sequential patterns prefixed with

β j . Based on this observation, the problem can be partitioned recursively. That is,

each subset of sequential patterns can be further divided when necessary. This forms

a divide-and-conquer framework. In the following, let us examine how to use the

prefix-based projection approach for mining sequential patterns based the following

example [ 3 ].

Example 4

≤

j

≤

(PrefixSpan) Given the sequence database D in Table 11.1 and

min _ suppor t

2, sequential patterns in S can be mined by a prefix-projection

method in the following steps.

=

1. Find length-1 sequential patterns. Scan D once to find all the frequent items in

sequences. Each of these frequent items is a length-1 sequential pattern. They

are

a

:4,

b

:4,

c

:4,

d

:3,

e

: 3, and

f

: 3, where the notation

: count ” represents the pattern and its associated support count.

2. Divide search space. The complete set of sequential patterns can be partitioned

into the following six subsets according to the six prefixes: (1) the ones with

prefix

“

pattern

.

3. Find subsets of sequential patterns. The subsets of sequential patterns can be

mined by constructing the corresponding set of projected databases and mining

each recursively. The projected databases as well as sequential patterns found in

them are listed in Fig. 11.5 , while the mining process is explained as follows.

(a) Find sequential patterns with prefix

a

, (2) the ones with prefix

b

, ... , and (6) the ones with prefix

f

a

. Only the sequences containing

a

should be collected. Moreover, in a sequence containing

a

, only the sub-

sequence prefixed with the first occurrence of

a

should be considered. For

example, in sequence

( ef )( ab )( df ) cb

, only the subsequence

(_ b )( df ) cb

should be considered for mining sequential patterns prefixed with

. Notice

that (_ b ) means that the last element in the prefix, which is a , together with b ,

form one element.

The sequences in D containing

a

a

are projected w.r.t.

a

to form the

a

-

projected database, which consists of four suffix sequences:

( abc )( ac ) d ( cf )

,

(_ d ) c ( bc )( ae )

,

(_ b )( df ) cb

and

(_ f ) cbc

.

Next Page

Frequent Pattern Mining

Search WWH ::

Custom Search

Home