Databases Reference
In-Depth Information
i. The radius of the circle is c.
ii. The inner and outer circles forming the ring have radii i and o, respec-
tively.
iii. All representative points for the two clusters are on the boundaries of the
clusters.
iv. Representative points are moved 20% of the distance from their initial
position toward the centroid of their cluster.
v. Clusters are merged if, after repositioning, there are representative points
from the two clusters at distance d or less.
In terms of d, c, i, and o, under what circumstances will the ring and circle be
merged into a single cluster?
7.5
Clustering in Non-Euclidean Spaces
We shall next consider an algorithm that handles non-main-memory data, but
does not require a Euclidean space. The algorithm, which we shall refer to as
GRGPF for its authors (V. Ganti, R. Ramakrishnan, J. Gehrke, A. Powell, and
J. French), takes ideas from both hierarchical and point-assignment approaches.
Like CURE, it represents clusters by sample points in main memory. However,
it also tries to organize the clusters hierarchically, in a tree, so a new point can
be assigned to the appropriate cluster by passing it down the tree. Leaves of
the tree hold summaries of some clusters, and interior nodes hold subsets of the
information describing the clusters reachable through that node. An attempt
is made to group clusters by their distance from one another, so the clusters at
a leaf are close, and the clusters reachable from one interior node are relatively
close as well.
7.5.1
Representing Clusters in the GRGPF Algorithm
As we assign points to clusters, the clusters can grow large. Most of the points
in a cluster are stored on disk, and are not used in guiding the assignment of
points, although they can be retrieved. The representation of a cluster in main
memory consists of several features. Before listing these features, if p is any
point in a cluster, let ROWSUM (p) be the sum of the squares of the distances
from p to each of the other points in the cluster. Note that, although we are not
in a Euclidean space, there is some distance measure d that applies to points,
or else it is not possible to cluster points at all.
1. N , the number of points in the cluster.
2. The clustroid of the cluster, which is defined specifically to be the point
in the cluster that minimizes the sum of the squares of the distances to
Search WWH ::




Custom Search