Database Reference
In-Depth Information
7.3.6
Exercises for Section 7.3
od of
Section 7.3.2
,
and the first point we choose is (3,4), which other points are selected.
three starting points by the method of
Section 7.3.2
we obtain points in each of the three
clusters.
Hint
: You could solve this exhaustively by begining with each of the twelve points
in turn. However, a more generally applicable solution is to consider the diameters of the
three clusters and also consider the
minimum intercluster distance
, that is, the minimum
distance between two points chosen from two different clusters. Can you prove a general
theorem based on these two parameters of a set of points?
!
EXERCISE
7.3.3 Give an example of a dataset and a selection of
k
initial centroids such
that when the points are reassigned to their nearest centroid at the end, at least one of the
initial
k
points is reassigned to a different cluster.
EXERCISE
7.3.4 For the three clusters of
Fig. 7.8
:
(a) Compute the representation of the cluster as in the BFR Algorithm. That is, compute
N
, SUM, and SUMSQ.
(b) Compute the variance and standard deviation of each cluster in each of the two dimen-
sions.
EXERCISE
7.3.5 Suppose a cluster of three-dimensional points has standard deviations of 2,
3, and 5, in the three dimensions, in that order. Compute the Mahalanobis distance between
the origin (0
,
0
,
0) and the point (1
,
−3
,
4).
7.4 The CURE Algorithm
We now turn to another large-scale-clustering algorithm in the point-assignment class. This
algorithm, called
CURE
(Clustering Using REpresentatives), assumes a Euclidean space.
However, it does not assume anything about the shape of clusters; they need not be nor-
mally distributed, and can even have strange bends, S-shapes, or even rings. Instead of rep-
resenting clusters by their centroid, it uses a collection of representative points, as the name
implies.