Database Reference
In-Depth Information
We call the representative point the clustroid . We can select the clustroid in various ways,
each designed to, in some sense, minimize the distances between the clustroid and the other
points in the cluster. Common choices include selecting as the clustroid the point that min-
imizes:
(1) The sum of the distances to the other points in the cluster.
(2) The maximum distance to another point in the cluster.
(3) The sum of the squares of the distances to the other points in the cluster.
EXAMPLE 7.7 Suppose we are using edit distance, and a cluster consists of the four points
abcd , aecdb , abecb , and ecdab . Their distances are found in the following table:
If we apply the three criteria for being the centroid to each of the four points of the
cluster, we find:
We can see from these measurements that whichever of the three criteria we choose,
aecdb will be selected as the clustroid. In general, different criteria could yield different
clustroids. □
The options for measuring the distance between clusters that were outlined in Section
7.2.3 can be applied in a non-Euclidean setting, provided we use the clustroid in place
of the centroid. For example, we can merge the two clusters whose clustroids are closest.
We could also use the average or minimum distance between all pairs of points from the
clusters.
Other suggested criteria involved measuring the density of a cluster, based on the radius
or diameter. Both these notions make sense in the non-Euclidean environment. The diamet-
er is still the maximum distance between any two points in the cluster. The radius can be
defined using the clustroid in place of the centroid. Moreover, it makes sense to use the
same sort of evaluation for the radius as we used to select the clustroid in the first place.
For example, if we take the clustroid to be the point with the smallest sum of squares of
distances to the other nodes, then define the radius to be that sum of squares (or its square
root).
Finally, Section 7.2.3 also discussed criteria for stopping the merging of clusters. None
of these criteria made direct use of the centroid, except through the notion of radius, and
we have already observed that “radius” makes good sense in non-Euclidean spaces. Thus,
Search WWH ::




Custom Search