Databases Reference
In-Depth Information
Figure 7.12: Two clusters, one surrounding the other
Example 7.10 : Figure 7.12 is an illustration of two clusters. The inner clus-
ter is an ordinary circle, while the second is a ring around the circle. This
arrangement is not completely pathological. A creature from another galaxy
might look at our solar system and observe that the objects cluster into an inner
circle (the planets) and an outer ring (the Kuyper belt), with little in between.
2
7.4.1 Initialization in CURE
We begin the CURE algorithm by:
1. Take a small sample of the data and cluster it in main memory. In prin-
ciple, any clustering method could be used, but as CURE is designed to
handle oddly shaped clusters, it is often advisable to use a hierarchical
method in which clusters are merged when they have a close pair of points.
This issue is discussed in more detail in Example 7.11 below.
2. Select a small set of points from each cluster to be representative points.
These points should be chosen to be as far from one another as possible,
using the method described in Section 7.3.2.
3. Move each of the representative points a fixed fraction of the distance
between its location and the centroid of its cluster. Perhaps 20% is a
good fraction to choose. Note that this step requires a Euclidean space,
since otherwise, there might not be any notion of a line between two
points.
Example 7.11 : We could use a hierarchical clustering algorithm on a sample
of the data from Fig. 7.12. If we took as the distance between clusters the
shortest distance between any pair of points, one from each cluster, then we
would correctly find the two clusters.
That is, pieces of the ring would stick
Search WWH ::




Custom Search