Biology Reference
In-Depth Information
9.3.4. Refinement Phase
In the refinement phase, we redo the process in the iterative phase once by using
the data points distributed by the result clusters at the end of the iterative phase,
as opposed to the localities of the medoids. Once the new dimensions have been
computed, we reassign the points to the medoids relative to these new sets of
dimensions.
9.3.4.1. Dimension Tuning Process
We notice that users need to specify the average number of dimensions denoted
as l in PROCLUS. Although it has achieved that different clusters have different
subsets of dimensions, the number of dimensions for each cluster is still under the
control of l , which is not flexible enough. According to criterion 2 discussed in
Section 1, we want the number of attributes in a cluster to be as large as possible.
Therefore, we propose one more step at the end of the refinement phase to reduce
the dependence on l . In this step, for each cluster i , we choose the dimension with
the smallest Z i,j value from the dimensions that are not chosen in previous steps
and add it to the dimensional space to see if the new cluster is better. If the new
cluster is better, we keep the newly added dimension and repeat this process to try
to add more dimensions; otherwise, it will be discarded and we stop trying for this
cluster. This process is achieved by the DimensionTuning algorithm for which the
pseudo-code is given in Algorithm 9.4. The quality of a cluster is evaluated by
the combination of criteria 1 and 3. A user-defined threshold is set for criterion 3.
Clusters that pass the threshold will be evaluated by criterion 1. By doing this, we
introduce criteria 2 and 3 into the algorithm, which gives a more balanced result.
Algorithm
9.4. DimensionTuning( M best , C 1 ,
..., C k , D 1 , D 2 ,...,
D k )
begin
for each cluster C i do
bestEvaluateValue= the average distance to centroid in C i
for each cluster C i do
isGood=false;
repeat
add the dimension j ( j/
D i ) with the smallest Z i , j value to D i
reassign the points to clusters C 1 ,..., C k according to the new D i
newEvaluateValue= the average distance to centroid in C i
if newEvaluateValue < bestEvaluateValue and the number of points in
C i is more than a threshold value then
Search WWH ::




Custom Search