Biology Reference
In-Depth Information
clustering and the results obtained for the original dataset are compared, for each
value of
k
, with the quality measures computed from the
m
randomizations. A
good clustering is one whose quality measure lies significantly above the range of
the quality measures computed from the randomized results, for the same value
of
k
.
The silhouette coefficient used here as a measure of cluster quality is based on
the idea that a good clustering should consist of
cohesive, well-separated
clusters.
Given a partitioning
of
N
objects into
k
clusters, consider any fixed object
i
and let
C
i
denote the set of indices for all objects clustered together with object
i
.
A useful measure of cohesion for this cluster is:
a
(
i
)=
1
n
i
P
d
ij
,
(15.6)
j
∈
C
i
where
n
i
is the number of objects in cluster
C
i
and
d
ij
is the dissimilarity between
objects
i
and
j
. Note that for a good clustering,
a
(
i
) should be small for all
i
.To
characterize the separation between clusters, let
K
denote the
th
neighboring
cluster, distinct from
C
i
,for
=1
,
2
,...,k
1.Define
b
(
i
) as the average dis-
similarity between object
i
in cluster
C
i
and the objects in the closest neighboring
cluster, given by:
−
!
"
1
n
b
(
i
)=min
d
ij
.
(15.7)
#
j∈K
Here, for a good clustering,
b
(
i
) should be large for all
i
. The silhouette coefficient
s
(
i
) for object
i
is then defined as the following normalized difference between
these two quantities:
b
(
i
)
−
a
(
i
)
s
(
i
)=
.
(15.8)
max
{
a
(
i
)
,b
(
i
)
}
For
s
(
i
) to be well-defined, the partitioning
must contain at least two clusters,
and every cluster must contain at least two objects. Under these conditions, it is
easily shown that
P
1 for all
i
. From the previous observations, a
good clustering should have
a
(
i
)
<< b
(
i
) for all
i
, implying
s
(
i
)
−
1
≤
s
(
i
)
≤
1 for all
i
.A
useful measure of the overall quality of the partitioning
P
is therefore the average
silhouette coefficient over all objects:
N
S
=
1
N
s
(
i
)
.
(15.9)
i
=1
Given this cluster quality measure, let
D
0
denote the original dataset and let
S
0
(
k
)
denote the value of
S
computed for the
k
-cluster partitioning of
D
0
obtained by a
Search WWH ::
Custom Search