Database Reference
In-Depth Information
computing the square of the length of the path to each q as the sum of the squares of the
three legs.
We must then finish computing the features for the merged cluster. We need to consider
all the points in the merged cluster for which we know the rowsum. These are, the centroids
of the two clusters, the k points closest to the clustroids for each cluster, and the k points
furthest from the clustroids for each cluster, with the exception of the point that was chosen
as the new clustroid. We can compute the distances from the new clustroid for each of these
4 k + 1 points. We select the k with the smallest distances as the “close” points and the k
with the largest distances as the “far” points. We can then compute the rowsums for the
chosen points, using the same formulas above that we used to compute the rowsums for the
candidate clustroids.
7.5.5
Exercises for Section 7.5
EXERCISE 7.5.1 Using the cluster representation of Section 7.5.1 , represent the twelve
points of Fig. 7.8 as a single cluster. Use parameter k = 2 as the number of close and distant
points to be included in the representation. Hint : Since the distance is Euclidean, we can
get the square of the distance between two points by taking the sum of the squares of the
differences along the x- and y-axes.
EXERCISE 7.5.2 Compute the radius, in the sense used by the GRGPF Algorithm (square
root of the average square of the distance from the clustroid) for the cluster that is the five
points in the lower right of Fig. 7.8 . Note that (11,4) is the clustroid.
7.6 Clustering for Streams and Parallelism
In this section, we shall consider briefly how one might cluster a stream. The model we
have in mind is one where there is a sliding window (recall Section 4.1.3 ) of N points, and
we can ask for the centroids or clustroids of the best clusters formed from the last m of these
points, for any m N . We also study a similar approach to clustering a large, fixed set of
points using MapReduce on a computing cluster (no pun intended). This section provides
only a rough outline to suggest the possibilities, which depend on our assumptions about
how clusters evolve in a stream.
Search WWH ::




Custom Search