A Projected Clustering Algorithm and Its Biomedical Application - Clustering Challenges in Biological Network

Biology Reference

In-Depth Information

iterative phase terminates and the best k medoids are reported. In the refinement

phase, the process in the iterative phase is redone once by using the data points dis-

tributed by the result clusters. Once the new dimensions have been computed, the

points are reassigned to the medoids with respect to these new sets of dimensions.

Outliers are also handled in this phase. This algorithm returns a partition of data

points, together with sets of dimensions on which data points in each cluster are

correlated. Nevertheless, the problem of pre-selecting user parameters still hasn't

been solved. It relies on random sampling in the initialization phase. Hence, small

clusters are likely to be missed.

Z

X 1

X 2

Y

b

a

X

Fig. 9.2.

Manhattan Segmental Distance

ORCLUS (arbitrarily ORiented projected CLUSter generation) [2] uses arbi-

trarily projected subspaces for finding clusters due to the fact that real data often

contains inter-attribute correlations, which leads to projections that are not paral-

lel to the original axis system. It also asks for two user parameters, the number

of clusters k and the cardinality of the dimensions for each cluster l . ORCLUS

modifies the PROCLUS algorithm by adding a merging process of clusters and

asks each cluster to select principal components instead of attributes. It improves

PROCLUS in that it can construct clusters in arbitrarily aligned subspaces of lower

dimensionality. However, ORCLUS requires all projected clusters to exist in the

same number of dimensions and it also relies on random sampling in the initial-

ization phase. Moreover, like CLIQUE and PROCLUS, it still needs some user

parameters though the guidance in finding a good value of l has been proposed in

this method.

9.3. The IPROCLUS Algorithm

Our algorithm, IPROCLUS, is based on PROCLUS. It takes the number of clusters

k and the average number of dimensions l in a cluster as inputs. It has three

phases: an initialization phase, an iterative phase, and a cluster refinement phase.

Compared to PROCLUS, we propose the modified Manhattan segmental distance

that is more accurate and meaningful in projected clustering. We add one more

Search WWH ::

Custom Search

Home