Database Reference
In-Depth Information
first time. In this ITEM_IS_EXIST_TABLE, the last position information of each
item in the sequence is recorded, and in each iteration, LAPIN-SPAM only needs to
check this table to get information that a candidate is behind current position or not
for effective candidate sequence pruning.
4
Pattern Growth Algorithms
Apriori-based algorithms generate an explosive number of candidate sequences for
mining long sequential patterns, which consume a lot of memory in the mining
process. To solve this problem, the pattern growth paradigm and the FP-Growth
algorithm [ 17 ] emerged in the early 2000 s, firstly proposed for association rule
mining. The key idea is to avoid the candidate generation and prune steps that occur
in the Apriori-based algorithms and repeatedly narrow the search space by dividing
a database into a set of smaller projected databases, which are mined separately.
4.1
FreeSpan
FreeSpan [ 15 ] mines sequential patterns by partitioning the search space and pro-
jecting the sequence sub-databases recursively based on the projected itemsets. The
database projection can be performed as follows. At the time of deriving p 's projected
database from DB, the set of frequent items X of DB is already known. Only those
items in X will need to be projected into p 's projected database. This effectively
discards irrelevant information and keeps the size of the projected database minimal.
By recursively doing so, one can mine the projected databases and generate the com-
plete set of sequential patterns in the given partition without duplication. The details
are illustrated in the following example [ 3 ].
Example 3
(FreeSpan) Given the sequence database D in Table 11.1 and
min _ suppor t
2, FreeSpan first scans D , collects the support for each item,
and finds the set of frequent items. This step is similar to GSP. Frequent items are
listed in support descending order (in the form of “ item: support ”), that is, f_list =
=
a :4, b :4, c :4, d :3, e :3, f :3
. They form six length-one sequential patterns:
:3.
According to the f_list, the complete set of sequential patterns in D can be divided
into 6 disjoint subsets: (1) the ones containing only item a , (2) the ones containing
item b but no item after b in f_list, (3) the ones containing item c but no item after c
in f_list, and so on, and finally, (6) the ones containing item f .
The sequential patterns related to the six partitioned subsets can be mined by
constructing six projected databases (obtained by one additional scan of the original
database). Infrequent items, such as g in this example, are removed from the projected
databases. The process for mining each projected database is detailed as follows.
a
:4,
b
:4,
c
:4,
d
:3,
e
:3,
f
Search WWH ::




Custom Search