Digital Signal Processing Reference
In-Depth Information
In this sense, the main objective of cluster validity is to determine the
optimal number of clusters that provide the best characterization of a
given multidimensional data set. An incorrect assignment of values to
the parameter of a clustering algorithm results in a data-partitioning
scheme that is not optimal, and thus leads to wrong decisions.
In this section, we evaluate the performance of the clustering tech-
niques in conjunction with three cluster validity indices: Kim's index,
the Calinski-Harabasz (CH) index, and the intraclass index. These in-
dices were successfully applied earlier in biomedical time-series analysis
[97]. In the following, we describe the above-mentioned indices.
Calinski-Harabasz index
: [39]: This index is computed for
m
data
points and
K
clusters as
[trace
B/
(
K
−
1)]
CH
=
(6.46)
[trace
W/
(
m
−
K
)]
where
B
and
W
represent the between- and within-cluster scatter ma-
trices.
The maximum hierarchy level is used to indicate the correct number
of partitions in the data.
Intraclass index
[97]: This index is given as
K
n
k
1
n
2
I
W
=
||
x
i
−
w
k
||
(6.47)
k=1
i=1
where
n
k
is the number of points in cluster
k
and
w
k
is a prototype
associated with the
k
th cluster.
I
W
is computed for different cluster
numbers. The maximum value of the second derivative of
I
W
as a
function of cluster number is taken as an estimate for the optimal
partition. This index provides a possible way of assessing the quality
of a partition of
K
clusters.
Kim's index
[138]: This index equals the sum of the overpartition
v
o
(
K,
X
,
W
), and the underpartition
v
u
(
K,
X
,
W
) function measure
I
Kim
=
v
u
(
K
)
−
v
umin
+
v
o
(
K
)
−
v
omin
.
(6.48)
v
umax
−
v
umin
v
omax
−
v
omin
where
v
u
(
K
) is the underpartitioned average over the cluster number of
the mean intracluster distance, and measures the structural compactness
Search WWH ::
Custom Search