Biology Reference
In-Depth Information
end for
return M
end
9.3.3. Iterative Phase
We begin by choosing a random set of k points from M . Then we iteratively
replace the bad medoids in the current best medoids set with random points from
M until the current best medoids set does not change after a certain number of
replacements have been tried.
In each iteration, we first find dimensions for each medoid in the set, and form
the cluster corresponding to each medoid. Then we evaluate the clustering and
replace the bad medoids in the current best medoids set if the new clustering is
better.
In order to find dimensions, we first define several notations. For each medoid
m i , δ i is the minimum distance from any other medoids to m i based on full di-
mensions. We define the locality L i to be the set of points within the distance of
δ i from m i and define X i,j to be the average distance to m i along dimension j .
We then calculate X i,j as follows. We divide the average distance from the points
in L i to m i along dimension j by the normalization factor n j . There are two con-
straints when associating dimensions to medoids. The total number of dimensions
associated to medoids must be equal to k
l since k is the number of clusters and
l is the average number of dimensions in a cluster. The number of dimensions
associated with each medoid must be at least 2, which makes each cluster mean-
×
ingful. For each medoid i , we compute the mean Y i = j =1 X i,j d ,and
the standard deviation σ i = j ( X i,j
1) of the values X i,j . Y i
represents the average modified Manhattan segmental distance of the points in L i
relative to the entire space. Thus Z i,j = ( X i,j
Y i ) 2 / ( d
Y i ) i indicates how the average
distance along dimension j associated with the medoid m i is related to the aver-
age modified Manhattan segmental distance associated with the same medoid. We
decide dimensions for all clusters by picking the smallest Z i,j values by a greedy
algorithm [10] used in PROCLUS such that a total of k
l values are chosen and
at least 2 values are chosen for each medoid. More specifically, all the Z i,j values
are sorted in increasing order. We assign the two smallest for each i , which gives a
total of 2 k values, and then greedily pick the remaining lowest ( l− 2) values.
We do a single pass over the database to assign the points to the medoids.
For each i , we compute the modified Manhattan segmental distance relative to D i
between a point and the medoid m i , and assign the point to the closest medoid.
×
Search WWH ::




Custom Search