Database Reference
In-Depth Information
we did make this combination eventually, but not until we had combined another pair
of points. In general, it is possible that this rule will result in an entirely different clus-
tering from that obtained using the distance-of-centroids rule.
(2) Take the distance between two clusters to be the average distance of all pairs of points,
one from each cluster.
(3) The radius of a cluster is the maximum distance between all the points and the
centroid. Combine the two clusters whose resulting cluster has the lowest radius. A
slight modification is to combine the clusters whose result has the lowest average dis-
tance between a point and the centroid. Another modification is to use the sum of the
squares of the distances between the points and the centroid. In some algorithms, we
shall find these variant definitions of “radius” referred to as “the radius.”
(4) The diameter of a cluster is the maximum distance between any two points of the
cluster. Note that the radius and diameter of a cluster are not related directly, as they
are in a circle, but there is a tendency for them to be proportional. We may choose to
merge those clusters whose resulting cluster has the smallest diameter. Variants of this
rule, analogous to the rule for radius, are possible.
EXAMPLE 7.4 Let us consider the cluster consisting of the five points at the right of Fig.
7.2 . The centroid of these five points is (10.8, 4.2). There is a tie for the two furthest points
from the centroid: (9,3) and (12,6), both at distance Thus, the radius is 2.16. For the
diameter, we find the two points in the cluster having the greatest distance. These are again
(9,3) and (12,6). Their distance is so that is the diameter. Notice that the diameter is
not exactly twice the radius, although it is close in this case. The reason is that the centroid
is not on the line between (9,3) and (12,6).
We also have options in determining when to stop the merging process. We already men-
tioned “stop when we have k clusters” for some predetermined k . Here are some other op-
tions.
(1) Stop if the diameter of the cluster that results from the best merger exceeds a threshold.
We can also base this rule on the radius, or on any of the variants of the radius men-
tioned above.
(2) Stop if the density of the cluster that results from the best merger is below some
threshold. The density can be defined in many different ways. Roughly, it should be
the number of cluster points per unit volume of the cluster. That ratio can be estimated
by the number of points divided by some power of the diameter or radius of the cluster.
The correct power could be the number of dimensions of the space. Sometimes, 1 or 2
is chosen as the power, regardless of the number of dimensions.
Search WWH ::




Custom Search