Information Technology Reference
In-Depth Information
8.5.2 General Algorithm
Based on the three steps previously described, the general algorithm for mining mul-
tidimensional sequential patterns is depicted in Fig. 8.14. The input parameters of this
algorithm are:
A multidimensional sequences database DB.
A support FT for mining sequential patterns .
A dissimilarity threshold DT for mining the most similar sequence to a given
sequential pattern .
A characterization threshold CT and a set of attributes and associated values
E(Ai., Vj).
The algorithm first mines maximal sequential patterns and places the result in D
(line 03). Each pattern becomes a model of a class of sequences. Individuals whom
sequences are the most similar to a given maximal sequential pattern are placed in the
corresponding class S c (line 05). The characterization algorithm (line 07) determines
which attribute values characterize the subset S c among E(Ai, Vj).
Mining_MSP(DB, FT, DT, CT, E(Ai,Vj))
01 D =
02 R =
03 D = Maximal_Sequential_Patterns (DB, FT)
04 For each S
D
05 S c = Cluster (DB, S, DT)
06 For Each S c
07
R = Characterization (S c , DB, CS, E(Ai., Vj))
Fig. 8.14. General Algorithm
Characterization (class se , database DB , real S, set E of (attribute A, value V))
00 se .characterization =
;
01 compute freq DB (prop) for all properties prop = (attributes, values) ;
02 For each attribute A
03 For each value V of A
04 compute freq se ( prop ) for the property prop = (attribute, value) ;
05
If F DB se ( prop ) ≥ S
Add ( se .characterization, prop ) ;
Fig. 8.15. Characterization Algorithm
The first step performs maximal sequential pattern mining . As seen before, IBM
algorithm shows the best performances. Therefore, we use this algorithm at this step
of the process. Frequent patterns are likely to have much more similar sequences in
the database than non frequent ones. Therefore, our approach of class generation
starts from those frequent patterns to determine classes' centers. We call them models .
However, not all sequential patterns can be considered as models , because some
are very similar to each others, and then do not maximize the dissimilarity between
Search WWH ::




Custom Search