Database Reference
In-Depth Information
problems associated with hierarchical clustering. We therefore consider “clustroids” and
the way we can represent a cluster when there is no centroid or average point in a cluster.
7.2.1
Hierarchical Clustering in a Euclidean Space
Any hierarchical clustering algorithm works as follows. We begin with every point in its
own cluster. As time goes on, larger clusters will be constructed by combining two smaller
clusters, and we have to decide in advance:
(1) How will clusters be represented?
(2) How will we choose which two clusters to merge?
(3) When will we stop combining clusters?
Once we have answers to these questions, the algorithm can be described succinctly as:
WHILE it is not time to stop DO
pick the best two clusters to merge;
combine those two clusters into one cluster;
END;
To begin, we shall assume the space is Euclidean. That allows us to represent a cluster
by its centroid or average of the points in the cluster. Note that in a cluster of one point, that
point is the centroid, so we can initialize the clusters straightforwardly. We can then use the
merging rule that the distance between any two clusters is the Euclidean distance between
their centroids, and we should pick the two clusters at the shortest distance. Other ways to
define intercluster distance are possible, and we can also pick the best pair of clusters on a
basis other than their distance. We shall discuss some options in Section 7.2.3 .
EXAMPLE 7.2 Let us see how the basic hierarchical clustering would work on the data of
Fig. 7.2 . These points live in a 2-dimensional Euclidean space, and each point is named by
its ( x, y ) coordinates. Initially, each point is in a cluster by itself and is the centroid of that
cluster. Among all the pairs of points, there are two pairs that are closest: (10,5) and (11,4)
or (11,4) and (12,3). Each is at distance Let us break ties arbitrarily and decide to com-
bine (11,4) with (12,3). The result is shown in Fig. 7.3 , including the centroid of the new
cluster, which is at (11.5, 3.5).
Search WWH ::




Custom Search