Discovery of Driving Behavior Patterns - Smart Information Systems: Computational Intelligence for Real-Life Applications

Information Technology Reference

In-Depth Information

within the established clusters. Therefore, we define the following cluster validation

index:

(

) =

RRR

(

)

(12.9)

−

z ∈{ Z }

c ∈{ C z \ z }

According to our problem setting, themore patterns occur jointlywhen comparing

each centroid z

of the corresponding cluster, the

lower E , the better our clustering, and the more characteristic are the corresponding

prototypes.

Furthermore, we are going to evaluate the clustering of time series according to

the index I [ 24 ], whose value is maximized for the optimal number of clusters:

∈{

}

with all objects c

∈{

C z \

}

k ·

(

)

(

) = (

) ·

D k )

(12.10)

(

, and

D k . The first factor will try to reduce index I as the number of clusters k increases.

The second factor consists of the ratio of E

The index I is a composition of three factors [ 24 ], namely 1

k , E

(

)

(

)

, which is constant for a given dataset,

and E

(

)

, which decreases with increase in k . Consequently, index I increases as

decreases, encouraging more clusters that are compact in nature. Finally, the

third factor, D k (which measures the maximum separation between two clusters over

all possible pairs of clusters), will increase with the value of k , but is bounded by the

maximum separation between two points in the dataset.

(

)

max

D k =

i , j = 1 ||

z i −

z j ||

(12.11)

Thus, the three factors are found to compete with and balance each other critically.

The power p is used to control the contrast between the different cluster configura-

tions. Previous work [ 24 ] suggests to choose p

The index I has been found to be consistent and reliable, irrespective of the

underlying clustering technique and data dimensionality, and furthermore has been

shown to outperform the Dunn and David-Bouldin index [ 24 ].

12.7 Dimensionality Reduction

As with most problems in computer science, the suitable choice of representation

greatly affects the ease and efficiency of time series data mining [ 15 ]. Piecewise

Aggregate Approximation (PAA), a popular windowed averaging technique, reduces

a time series x of length n to length n

r by dividing the data into r equal sized frames.

The mean value of the data falling within a frame is calculated and a vector of these

values becomes the data-reduced representation.

Smart Information Systems: Computational Intelligence for Real-Life Applications

Search WWH ::

Custom Search

Home