A Projected Clustering Algorithm and Its Biomedical Application - Clustering Challenges in Biological Network

Biology Reference

In-Depth Information

9.3.4. Refinement Phase

In the refinement phase, we redo the process in the iterative phase once by using

the data points distributed by the result clusters at the end of the iterative phase,

as opposed to the localities of the medoids. Once the new dimensions have been

computed, we reassign the points to the medoids relative to these new sets of

dimensions.

9.3.4.1. Dimension Tuning Process

We notice that users need to specify the average number of dimensions denoted

as l in PROCLUS. Although it has achieved that different clusters have different

subsets of dimensions, the number of dimensions for each cluster is still under the

control of l , which is not flexible enough. According to criterion 2 discussed in

Section 1, we want the number of attributes in a cluster to be as large as possible.

Therefore, we propose one more step at the end of the refinement phase to reduce

the dependence on l . In this step, for each cluster i , we choose the dimension with

the smallest Z i,j value from the dimensions that are not chosen in previous steps

and add it to the dimensional space to see if the new cluster is better. If the new

cluster is better, we keep the newly added dimension and repeat this process to try

to add more dimensions; otherwise, it will be discarded and we stop trying for this

cluster. This process is achieved by the DimensionTuning algorithm for which the

pseudo-code is given in Algorithm 9.4. The quality of a cluster is evaluated by

the combination of criteria 1 and 3. A user-defined threshold is set for criterion 3.

Clusters that pass the threshold will be evaluated by criterion 1. By doing this, we

introduce criteria 2 and 3 into the algorithm, which gives a more balanced result.

Algorithm

9.4. DimensionTuning( M best , C 1 ,

..., C k , D 1 , D 2 ,...,

D k )

begin

for each cluster C i do

bestEvaluateValue= the average distance to centroid in C i

for each cluster C i do

isGood=false;

repeat

add the dimension j ( j/

D i ) with the smallest Z i , j value to D i

reassign the points to clusters C 1 ,..., C k according to the new D i

newEvaluateValue= the average distance to centroid in C i

if newEvaluateValue < bestEvaluateValue and the number of points in

C i is more than a threshold value then

∈

Search WWH ::

Custom Search

Home