Sequential Pattern Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

first time. In this ITEM_IS_EXIST_TABLE, the last position information of each

item in the sequence is recorded, and in each iteration, LAPIN-SPAM only needs to

check this table to get information that a candidate is behind current position or not

for effective candidate sequence pruning.

4

Pattern Growth Algorithms

Apriori-based algorithms generate an explosive number of candidate sequences for

mining long sequential patterns, which consume a lot of memory in the mining

process. To solve this problem, the pattern growth paradigm and the FP-Growth

algorithm [ 17 ] emerged in the early 2000 s, firstly proposed for association rule

mining. The key idea is to avoid the candidate generation and prune steps that occur

in the Apriori-based algorithms and repeatedly narrow the search space by dividing

a database into a set of smaller projected databases, which are mined separately.

4.1

FreeSpan

FreeSpan [ 15 ] mines sequential patterns by partitioning the search space and pro-

jecting the sequence sub-databases recursively based on the projected itemsets. The

database projection can be performed as follows. At the time of deriving p 's projected

database from DB, the set of frequent items X of DB is already known. Only those

items in X will need to be projected into p 's projected database. This effectively

discards irrelevant information and keeps the size of the projected database minimal.

By recursively doing so, one can mine the projected databases and generate the com-

plete set of sequential patterns in the given partition without duplication. The details

are illustrated in the following example [ 3 ].

Example 3

(FreeSpan) Given the sequence database D in Table 11.1 and

min _ suppor t

2, FreeSpan first scans D , collects the support for each item,

and finds the set of frequent items. This step is similar to GSP. Frequent items are

listed in support descending order (in the form of “ item: support ”), that is, f_list =

=

a :4, b :4, c :4, d :3, e :3, f :3

. They form six length-one sequential patterns:

:3.

According to the f_list, the complete set of sequential patterns in D can be divided

into 6 disjoint subsets: (1) the ones containing only item a , (2) the ones containing

item b but no item after b in f_list, (3) the ones containing item c but no item after c

in f_list, and so on, and finally, (6) the ones containing item f .

The sequential patterns related to the six partitioned subsets can be mined by

constructing six projected databases (obtained by one additional scan of the original

database). Infrequent items, such as g in this example, are removed from the projected

databases. The process for mining each projected database is detailed as follows.

a

:4,

b

:4,

c

:4,

d

:3,

e

:3,

f

Frequent Pattern Mining

Search WWH ::

Custom Search

Home