Databases Reference
In-Depth Information
WHILE there are fewer than k points DO
Add the point whose minimum distance from the selected
points is as large as possible;
END;
Example 7.8 : Let us consider the twelve points of Fig. 7.2, which we repro-
duce here as Fig. 7.8. In the worst case, our initial choice of a point is near the
center, say (6,8). The furthest point from (6,8) is (12,3), so that point is chosen
next.
(4,10)
(7,10)
(4,8)
(6,8)
(12,6)
(10,5)
(11,4)
(3,4)
(9,3)
(12,3)
(2,2)
(5,2)
Figure 7.8: Repeat of Fig. 7.2
Among the remaining ten points, the one whose minimum distance to either
(6,8) or (12,3) is a maximum is (2,2). That point has distance
52 = 7.21 from
(6,8) and distance
101 = 10.05 to (12,3); thus its “score” is 7.21. You can
check easily that any other point is less than distance 7.21 from at least one of
(6,8) and (12.3). Our selection of three starting points is thus (6,8), (12,3), and
(2,2). Notice that these three belong to different clusters.
Had we started with a different point, say (10,5), we would get a different
set of three initial points. In this case, the starting points would be (10,5),
(2,2), and (4,10). Again, these points belong to the three different clusters.
2
7.3.3
Picking the Right Value of k
We may not know the correct value of k to use in a k-means clustering. How-
ever, if we can measure the quality of the clustering for various values of k, we
can usually guess what the right value of k is. Recall the discussion in Sec-
tion 7.2.3, especially Example 7.5, where we observed that if we take a measure
of appropriateness for clusters, such as average radius or diameter, that value
will grow slowly, as long as the number of clusters we assume remains at or
above the true number of clusters. However, as soon as we try to form fewer
Search WWH ::




Custom Search