Information Technology Reference
In-Depth Information
within the established clusters. Therefore, we define the following cluster validation
index:
1
E
(
k
)
=
RRR
(
z
,
c
)
(12.9)
t
−
k
z
∈{
Z
}
c
∈{
C
z
\
z
}
According to our problem setting, themore patterns occur jointlywhen comparing
each centroid
z
of the corresponding cluster, the
lower
E
, the better our clustering, and the more characteristic are the corresponding
prototypes.
Furthermore, we are going to evaluate the clustering of time series according to
the index
I
[
24
], whose value is maximized for the optimal number of clusters:
∈{
Z
}
with all objects
c
∈{
C
z
\
z
}
1
k
·
E
(
1
)
p
I
(
k
)
=
(
)
·
D
k
)
(12.10)
E
(
k
, and
D
k
. The first factor will try to reduce index
I
as the number of clusters
k
increases.
The second factor consists of the ratio of
E
The index
I
is a composition of three factors [
24
], namely 1
/
k
,
E
(
1
)/
E
(
k
)
(
1
)
, which is constant for a given dataset,
and
E
(
k
)
, which decreases with increase in
k
. Consequently, index
I
increases as
decreases, encouraging more clusters that are compact in nature. Finally, the
third factor,
D
k
(which measures the maximum separation between two clusters over
all possible pairs of clusters), will increase with the value of
k
, but is bounded by the
maximum separation between two points in the dataset.
E
(
k
)
k
max
D
k
=
i
,
j
=
1
||
z
i
−
z
j
||
(12.11)
Thus, the three factors are found to compete with and balance each other critically.
The power
p
is used to control the contrast between the different cluster configura-
tions. Previous work [
24
] suggests to choose
p
=
2.
The index
I
has been found to be consistent and reliable, irrespective of the
underlying clustering technique and data dimensionality, and furthermore has been
shown to outperform the Dunn and David-Bouldin index [
24
].
12.7 Dimensionality Reduction
As with most problems in computer science, the suitable choice of representation
greatly affects the ease and efficiency of time series data mining [
15
]. Piecewise
Aggregate Approximation (PAA), a popular windowed averaging technique, reduces
a time series
x
of length
n
to length
n
r
by dividing the data into
r
equal sized frames.
The mean value of the data falling within a frame is calculated and a vector of these
values becomes the data-reduced representation.
/
Search WWH ::
Custom Search