Database Reference
In-Depth Information
EXAMPLE 7.10 Figure 7.12 is an illustration of two clusters. The inner cluster is an ordinary
circle, while the second is a ring around the circle. This arrangement is not completely
pathological. A creature from another galaxy might look at our solar system and observe
that the objects cluster into an inner circle (the planets) and an outer ring (the Kuyper belt),
with little in between.
Figure 7.12 Two clusters, one surrounding the other
7.4.1
Initialization in CURE
We begin the CURE algorithm by:
(1) Take a small sample of the data and cluster it in main memory. In principle, any clus-
tering method could be used, but as CURE is designed to handle oddly shaped clusters,
it is often advisable to use a hierarchical method in which clusters are merged when
they have a close pair of points. This issue is discussed in more detail in Example 7.11
below.
(2) Select a small set of points from each cluster to be representative points . These points
should be chosen to be as far from one another as possible, using the method described
in Section 7.3.2 .
(3) Move each of the representative points a fixed fraction of the distance between its loc-
ation and the centroid of its cluster. Perhaps 20% is a good fraction to choose. Note
that this step requires a Euclidean space, since otherwise, there might not be any no-
tion of a line between two points.
EXAMPLE 7.11 We could use a hierarchical clustering algorithm on a sample of the data
from Fig. 7.12 . If we took as the distance between clusters the shortest distance between
any pair of points, one from each cluster, then we would correctly find the two clusters.
That is, pieces of the ring would stick together, and pieces of the inner circle would
stick together, but pieces of ring would always be far away from the pieces of the circle.
Note that if we used the rule that the distance between clusters was the distance between
their centroids, then we might not get the intuitively correct result. The reason is that the
centroids of both clusters are in the center of the diagram.
Search WWH ::




Custom Search