From Sequence Mining to Multidimensional Sequence Mining - Mining Complex Data

Information Technology Reference

In-Depth Information

8.5.2 General Algorithm

Based on the three steps previously described, the general algorithm for mining mul-

tidimensional sequential patterns is depicted in Fig. 8.14. The input parameters of this

algorithm are:

A multidimensional sequences database DB.

A support FT for mining sequential patterns .

A dissimilarity threshold DT for mining the most similar sequence to a given

sequential pattern .

A characterization threshold CT and a set of attributes and associated values

E(Ai., Vj).

The algorithm first mines maximal sequential patterns and places the result in D

(line 03). Each pattern becomes a model of a class of sequences. Individuals whom

sequences are the most similar to a given maximal sequential pattern are placed in the

corresponding class S c (line 05). The characterization algorithm (line 07) determines

which attribute values characterize the subset S c among E(Ai, Vj).

Mining_MSP(DB, FT, DT, CT, E(Ai,Vj))

01 D =

∅

02 R =

03 D = Maximal_Sequential_Patterns (DB, FT)

04 For each S

∅

D

05 S c = Cluster (DB, S, DT)

06 For Each S c

07

∈

R = Characterization (S c , DB, CS, E(Ai., Vj))

Fig. 8.14. General Algorithm

Characterization (class se , database DB , real S, set E of (attribute A, value V))

00 se .characterization =

;

01 compute freq DB (prop) for all properties prop = (attributes, values) ;

02 For each attribute A

03 For each value V of A

04 compute freq se ( prop ) for the property prop = (attribute, value) ;

05

∅

If F DB se ( prop ) ≥ S

Add ( se .characterization, prop ) ;

Fig. 8.15. Characterization Algorithm

The first step performs maximal sequential pattern mining . As seen before, IBM

algorithm shows the best performances. Therefore, we use this algorithm at this step

of the process. Frequent patterns are likely to have much more similar sequences in

the database than non frequent ones. Therefore, our approach of class generation

starts from those frequent patterns to determine classes' centers. We call them models .

However, not all sequential patterns can be considered as models , because some

are very similar to each others, and then do not maximize the dissimilarity between

Mining Complex Data

Search WWH ::

Custom Search

Home