Biology Reference
In-Depth Information
cluster or some stopping criteria is satisfied. The divisive hierarchical clustering
algorithm proceeds in the opposite way. It initially takes the whole dataset as a
single cluster. At each step, one cluster is split into two until each cluster contains
only one object or some stopping criteria is satisfied. Since agglomerative hierar-
chical clustering algorithm is more popularly used than the divisive algorithm, in
this section, we only give the details of the agglomerative algorithm.
The agglomerative hierarchical clustering algorithm [22] is illustrated in
Fig. 5.3 by an example of a 2-D dataset with 8 objects, A, B, C, ... ,H,which
constitute 3 clusters obviously. This algorithm yields a dendrogram which shows
the combination of two clusters at each step, as shown in Fig. 5.3(b).
Fig. 5.3.
Agglomerative hierarchical clustering algorithm: (a) 2-D example; (b) output dendrogram.
Figure 5.3(a) shows the 8 objects in the 2-D example dataset. Each slash
ellipse represents a combination of two clusters. The step number when the com-
bination happens is shown as the number in a pair of parentheses on that slash
ellipse. For instance, the slash ellipse numbered (4) represents the combination
of two clusters at step (4), one has objects A and B which are combined into one
cluster in step (3), and the other one has a single object C.
The dendrogram shown in Fig. 5.3(b) illustrates the combinations of clusters
in the procedure of the hierarchical clustering algorithm more clearly. The vertical
lines represent the remained clusters. A horizontal line represents the combination
of two clusters. Its two ends connect two vertical lines representing two existing
clusters when the combination happens. The vertical location of each horizontal
line represents the dissimilarity measure when two clusters are combined into
one. For instance, in Fig. 5.3(b), clusters D and E are combined into one cluster
at the first step since they have the minimum dissimilarity. So the horizontal line
representing this combination is located lowest in the dendrogram, and it connects
two vertical lines representing clusters D and E. Now, we have seven clusters,
which are (A), (B), (C), (D, E), (F), (G) and (H). Then, clusters G and H are
Search WWH ::




Custom Search