Database Reference
In-Depth Information
7.2.2
Efficiency of Hierarchical Clustering
The basic algorithm for hierarchical clustering is not very efficient. At each step, we must
compute the distances between each pair of clusters, in order to find the best merger. The
initial step takes O ( n 2 ) time, but subsequent steps take time proportional to ( n − 1) 2 , ( n
2) 2 , . . . . The sum of squares up to n is O ( n 3 ), so this algorithm is cubic. Thus, it cannot be
run except for fairly small numbers of points.
However, there is a somewhat more efficient implementation of which we should be
aware.
(1) We start, as we must, by computing the distances between all pairs of points, and this
step is O ( n 2 ).
(2) Form the pairs and their distances into a priority queue, so we can always find the
smallest distance in one step. This operation is also O ( n 2 ).
(3) When we decide to merge two clusters C and D , we remove all entries in the priority
queue involving one of these two clusters; that requires work O ( n log n ) since there are
at most 2 n deletions to be performed, and priority-queue deletion can be performed in
O (log n ) time.
(4) We then compute all the distances between the new cluster and the remaining clusters.
This work is also O ( n log n ), as there are at most n entries to be inserted into the prior-
ity queue, and insertion into a priority queue can also be done in O (log n ) time.
Since the last two steps are executed at most n times, and the first two steps are executed
only once, the overall running time of this algorithm is O ( n 2 log n ). That is better than
O ( n 3 ), but it still puts a strong limit on how large n can be before it becomes infeasible to
use this clustering approach.
7.2.3
Alternative Rules for Controlling Hierarchical
Clustering
We have seen one rule for picking the best clusters to merge: find the pair with the smallest
distance between their centroids. Some other options are:
(1) Take the distance between two clusters to be the minimum of the distances between
any two points, one chosen from each cluster. For example, in Fig. 7.3 we would next
chose to cluster the point (10,5) with the cluster of two points, since (10,5) has dis-
tance and no other pair of unclustered points is that close. Note that in Example 7.2 ,
Search WWH ::




Custom Search