Databases Reference
In-Depth Information
Silhouette Coefficient
A measure that combines both cohesion and separation is the silhouette
coefficient. For a single observation it is computed as
ðb i a i Þ
max ða i ; b i Þ
s i ¼
i th observation and all other
where a i
is the average distance between the
observations in the same cluster; and b i
is the minimum average distance of the
i th observation to all other clusters.
The silhouette coefficient ranges in value from 1 to 1. When an observation
is closer on average to observations in another cluster than to observations in its
own cluster, then b i is less than a i and the coefficient is negative - an undesirable
result. The ideal silhouette coefficient is 1, which occurs when a i
is 0 (all
observations in the cluster congregate at the centroid).
To find the silhouette coefficient for a cluster, compute the average silhouette
coefficient of all observations in the cluster. To find a clusterings overall silhouette
coefficient, compute the average silhouette coefficient of all observations.
Correlation Coefficient
Another measure of clustering validity that combines both separation and
cohesion is the coefficient of correlation between distance and correctness.
Suppose that an m by m distance matrix
is constructed to hold distances
between all observation pairings where d ij is the distance between observation i
and observation j. A second m by m indicator matrix (
D
) is constructed to hold a
0/1 indicator value reflecting whether the corresponding observations are in the
same (0) or different clusters (1). See Figure 7.5. Therefore, c ij is 0 if
observations i and j are in the same cluster and 1 otherwise. The coefficient
of correlation for the clustering is computed as the Pearson correlation between
the two matrices, pairing only values below the main diagonal (due to
symmetry). In a valid clustering larger distance values will more likely be
paired with 1's in the indicator matrix and lower distance values will more likely
be in the same cluster (0 indicator values).
C
Self-Organizing Maps (SOM)
The self-organizing map algorithm was developed by Tuevo Kohonen. It is
similar in approach to K-means and its variants. The primary differences are
 
Search WWH ::




Custom Search