Database Reference
In-Depth Information
7.3.2
Initializing Clusters for K-Means
We want to pick points that have a good chance of lying in different clusters. There are two
approaches.
(1) Pick points that are as far away from one another as possible.
(2) Cluster a sample of the data, perhaps hierarchically, so there are k clusters. Pick a point
from each cluster, perhaps that point closest to the centroid of the cluster.
The second approach requires little elaboration. For the first approach, there are vari-
ations. One good choice is:
Pick the first point at random;
WHILE there are fewer than k points DO
Add the point whose minimum distance from the selected
points is as large as possible;
END;
EXAMPLE 7.8 Let us consider the twelve points of Fig. 7.2 , which we reproduce here as
Fig. 7.8 . In the worst case, our initial choice of a point is near the center, say (6,8). The
furthest point from (6,8) is (12,3), so that point is chosen next.
Figure 7.8 Repeat of Fig. 7.2
Among the remaining ten points, the one whose minimum distance to either (6,8) or
(12,3) is a maximum is (2,2). That point has distance from (6,8) and distance to
(12,3); thus its “score” is 7.21. You can check easily that any other point is less than dis-
tance 7.21 from at least one of (6,8) and (12,3). Our selection of three starting points is thus
(6,8), (12,3), and (2,2). Notice that these three belong to different clusters.
Had we started with a different point, say (10,5), we would get a different set of three ini-
tial points. In this case, the starting points would be (10,5), (2,2), and (4,10). Again, these
points belong to the three different clusters.
Search WWH ::




Custom Search