Information Technology Reference
In-Depth Information
8.5.2 General Algorithm
Based on the three steps previously described, the general algorithm for mining mul-
tidimensional sequential patterns is depicted in Fig. 8.14. The input parameters of this
algorithm are:
A
multidimensional sequences
database DB.
A support FT for mining
sequential patterns
.
A
dissimilarity
threshold DT for mining the most similar sequence to a given
sequential pattern
.
A
characterization
threshold CT and a set of attributes and associated values
E(Ai., Vj).
The algorithm first mines
maximal sequential patterns
and places the result in D
(line 03). Each
pattern
becomes a
model
of a class of sequences. Individuals whom
sequences are the most similar to a given
maximal sequential pattern
are placed in the
corresponding class S
c
(line 05). The characterization algorithm (line 07) determines
which attribute values characterize the subset S
c
among E(Ai, Vj).
Mining_MSP(DB, FT, DT, CT, E(Ai,Vj))
01
D =
∅
02
R =
03
D = Maximal_Sequential_Patterns (DB, FT)
04 For each
S
∅
D
05
S
c
= Cluster (DB, S, DT)
06 For Each
S
c
07
∈
R = Characterization (S
c
, DB, CS, E(Ai., Vj))
Fig. 8.14.
General Algorithm
Characterization (class
se
, database
DB
, real S, set E of (attribute A, value V))
00
se
.characterization =
;
01
compute freq
DB
(prop) for all properties
prop
= (attributes, values) ;
02
For
each attribute A
03
For
each value V of A
04
compute freq
se
(
prop
) for the property
prop
= (attribute, value) ;
05
∅
If
F
DB
se
(
prop
) ≥ S
Add (
se
.characterization,
prop
) ;
Fig. 8.15.
Characterization Algorithm
The first step performs
maximal
sequential pattern
mining
. As seen before,
IBM
algorithm shows the best performances. Therefore, we use this algorithm at this step
of the process. Frequent patterns are likely to have much more similar sequences in
the database than non frequent ones. Therefore, our approach of class generation
starts from those frequent patterns to determine classes' centers. We call them
models
.
However, not all
sequential patterns
can be considered as
models
, because some
are very similar to each others, and then do not maximize the dissimilarity between