Databases Reference
In-Depth Information
refers. We use c 1 and c 2 for the clustroids of C 1 and C 2 , respectively.
In detail, we compute the sum of the squares of the distances from p to all
the nodes in the combined cluster C by beginning with ROWSUM C 1 (p) to get
the terms for the points in the same cluster as p. For the N C 2 points q in C 2 ,
we consider the path from p to the clustroid of C 1 and then to C 2 , and finally
to q. We assume there is a right angle between the legs from p to c 1 and c 1 to
c 2 , and another right angle between the shortest path from from p to c 2 and the
leg from c 2 to q. We then use the Pythagorean theorem to justify computing
the square of the length of the path to each q as the sum of the squares of the
three legs.
We must then finish computing the features for the merged cluster. We
need to consider all the points in the merged cluster for which we know the
rowsum. These are, the centroids of the two clusters, the k points closest to the
clustroids for each cluster, and the k points furthest from the clustroids for each
cluster, with the exception of the point that was chosen as the new clustroid.
We can compute the distances from the new clustroid for each of these 4k + 1
points. We select the k with the smallest distances as the “close” points and
the k with the largest distances as the “far” points. We can then compute the
rowsums for the chosen points, using the same formulas above that we used to
compute the rowsums for the candidate clustroids.
7.5.5 Exercises for Section 7.5
Exercise 7.5.1 : Using the cluster representation of Section 7.5.1, represent
the twelve points of Fig. 7.8 as a single cluster. Use parameter k = 2 as the
number of close and distant points to be included in the representation. Hint :
Since the distance is Euclidean, we can get the square of the distance between
two points by taking the sum of the squares of the differences along the x- and
y-axes.
Exercise 7.5.2 : Compute the radius, in the sense used by the GRGPF Algo-
rithm (square root of the average square of the distance from the clustroid) for
the cluster that is the five points in the lower right of Fig. 7.8. Note that (11,4)
is the clustroid.
7.6
Clustering for Streams and Parallelism
In this section, we shall consider briefly how one might cluster a stream. The
model we have in mind is one where there is a sliding window (recall Sec-
tion 4.1.3) of N points, and we can ask for the centroids or clustroids of the
best clusters formed from the last m of these points, for any m≤N . We also
study a similar approach to clustering a large, fixed set of points using map-
reduce on a computing cluster (no pun intended). This section provides only
a rough outline to suggest the possibilities, which depend on our assumptions
about how clusters evolve in a stream.
Search WWH ::




Custom Search