Sequential Pattern Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

Mining sequential patterns containing only item a. The

-projected database

. By mining this projected database, only one additional

sequential pattern containing only item a , i.e.,

{ aaa

a }

: 2, is found.

Mining sequential patterns containing item b but no item after b in the f_list. By

mining the

, four additional

sequential patterns containing item b but no item after b in f_list are found. They are

{

-projected database:

{

a ( ab ) a

aba

( ab ) b

}

Mining sequential patterns containing item c but no item after c in the f_list. The

mining of the

:4,

:2,

( ab )

:2,

aba

}

proceeds as follows. One scan of the projected database generates the set of length-2

frequent sequences, which are {

-projected database: {

a ( abc )( ac ) c

ac ( bc ) a

( ab ) cb

acbc

:4,

( bc )

:2,

:3,

:2,

}

. One additional scan of the

-projected database generates all of its projected

databases. The mining of the

-projected database:

{

a ( abc )( ac ) c

ac ( bc ) a

( ab ) cb

acbc

}

generates the set of length-3 patterns as follows:

{

acb

:3,

acc

. Four projected database will be generated from them. The

mining of the first one, the

( ab ) c

:2,

aca

}

generates no length-4 pattern. The mining along this line terminates. Similarly, we

can show that the mining of the other three projected databases terminates without

generating any length-4 patterns for the

acb

-projected database:

{

ac ( bc ) a

( ab ) cb

acbc

-projected database.

Mining other subsets of sequential patterns. Other subsets of sequential patterns

can be mined similarly on their corresponding projected databases. This mining

process proceeds recursively, which derives the complete set of sequential patterns.

The detailed presentation of the FreeSpan algorithm, the proof of its completeness

and correctness, and the performance study of the algorithm are in [ 15 ].

4.2

PrefixSpan

PrefixSpan [ 16 ] builds upon the concept of FreeSpan but instead of projecting se-

quence databases it examines only the prefix subsequences and projects only their

corresponding suffix subsequences into projected databases. This way, sequential

patterns are grown in each projected database by exploring only local frequent se-

quences. For a sequence s

a ( abc )( ac ) d ( cf )

a ( ab )

and

a ( abc )

are prefixes of sequence s , but neither

nor

a ( bc )

is considered as a prefix

if every item in the prefix

a ( abc )

of sequence s is frequent in database D . Also,

( abc )( ac ) d ( cf )

is the suffix w.r.t. the prefix

(_ bc )( ac ) d ( cf )

is the suffix w.r.t.

the prefix

The problem of mining sequential patterns can be decomposed into a set of

subproblems. Let

, and

(_ c )( ac ) d ( cf )

is the suffix w.r.t. the prefix

a ( ab )

be the complete set of length-1 sequential

patterns in a sequence database D . The complete set of sequential patterns in D can

be divided into n disjoint subsets. The i -th subset (1

{

x 1

x 2

···

x n }

≤

n ) is the set of sequential

{

x i }

{

···

, β m }

patterns with prefix

. Let α be a length- l sequential pattern and

β 1 , β 2 ,

be the set of all length-( l

1) sequential patterns with prefix α . The complete set of

sequential patterns with prefix α , except for α itself, can be divided into m disjoint

Frequent Pattern Mining

Search WWH ::

Custom Search

Home