Information Technology Reference
In-Depth Information
within the established clusters. Therefore, we define the following cluster validation
index:
1
E
(
k
) =
RRR
(
z
,
c
)
(12.9)
t
k
z ∈{ Z }
c ∈{ C z \ z }
According to our problem setting, themore patterns occur jointlywhen comparing
each centroid z
of the corresponding cluster, the
lower E , the better our clustering, and the more characteristic are the corresponding
prototypes.
Furthermore, we are going to evaluate the clustering of time series according to
the index I [ 24 ], whose value is maximized for the optimal number of clusters:
∈{
Z
}
with all objects c
∈{
C z \
z
}
1
k ·
E
(
1
)
p
I
(
k
) = (
) ·
D k )
(12.10)
E
(
k
, and
D k . The first factor will try to reduce index I as the number of clusters k increases.
The second factor consists of the ratio of E
The index I is a composition of three factors [ 24 ], namely 1
/
k , E
(
1
)/
E
(
k
)
(
1
)
, which is constant for a given dataset,
and E
(
k
)
, which decreases with increase in k . Consequently, index I increases as
decreases, encouraging more clusters that are compact in nature. Finally, the
third factor, D k (which measures the maximum separation between two clusters over
all possible pairs of clusters), will increase with the value of k , but is bounded by the
maximum separation between two points in the dataset.
E
(
k
)
k
max
D k =
i , j = 1 ||
z i
z j ||
(12.11)
Thus, the three factors are found to compete with and balance each other critically.
The power p is used to control the contrast between the different cluster configura-
tions. Previous work [ 24 ] suggests to choose p
=
2.
The index I has been found to be consistent and reliable, irrespective of the
underlying clustering technique and data dimensionality, and furthermore has been
shown to outperform the Dunn and David-Bouldin index [ 24 ].
12.7 Dimensionality Reduction
As with most problems in computer science, the suitable choice of representation
greatly affects the ease and efficiency of time series data mining [ 15 ]. Piecewise
Aggregate Approximation (PAA), a popular windowed averaging technique, reduces
a time series x of length n to length n
r by dividing the data into r equal sized frames.
The mean value of the data falling within a frame is calculated and a vector of these
values becomes the data-reduced representation.
/
 
Search WWH ::




Custom Search