Biology Reference
In-Depth Information
hierarchical clustering. They conclude that if random data is generated by uni-
form or normal distributions, complete-linkage dissimilarity measure performs
best. Single-linkage dissimilarity measure performs more poorly than complete-
linkage, but its performance tends to improve as the number of objects N in the
dataset increases.
However, the single-linkage dissimilarity measure has its advantage in being
more versatile in dealing with non-convex clusters. Figure 5.5 shows an example
with two concentric clusters. In this figure, black dots represent objects from class
1, and circles represent objects from class 2. Hierarchical clustering algorithm
using single-linkage dissimilarity measure can correctly cluster these objects, but
the algorithm using complete-linkage dissimilarity measure can not.
Fig. 5.5.
Two concentric clusters.
The other source of variant hierarchical clustering algorithm is the stopping
criteria. Some hierarchical clustering algorithms assume that the number of clus-
ters K is known in advance. An algorithm stops when the number of clusters
derived by the algorithm equals K . So, only partial dendrogram can be generated
(from N clusters to K clusters, instead of from N clusters to 1 cluster in a full
dendrogram). For instance, in Fig. 5.3, if we know that K =3, after the fifth
combination, we retrieve 3 clusters. The algorithm stops and concludes that the 3
clusters are (A, B, C), (D, E, F) and (G, H).
Some other hierarchical clustering algorithms determine K dynamically. The
output clusters of the hierarchical clustering algorithm are just the clusters in the
full dendrogram when K clusters are retrieved.
In the following section, we are going to introduce several methods of deter-
mining the number of clusters.
5.3.4. Self-Organizing Map
Self-organizing map (SOM) is a popularly used data visualization and clustering
algorithm. It maps high-dimension data into a low dimension (typically 2-D or
3-D) map space. An SOM consists of components called neurons or nodes. Each
Search WWH ::




Custom Search