Biology Reference
In-Depth Information
iterative phase terminates and the best k medoids are reported. In the refinement
phase, the process in the iterative phase is redone once by using the data points dis-
tributed by the result clusters. Once the new dimensions have been computed, the
points are reassigned to the medoids with respect to these new sets of dimensions.
Outliers are also handled in this phase. This algorithm returns a partition of data
points, together with sets of dimensions on which data points in each cluster are
correlated. Nevertheless, the problem of pre-selecting user parameters still hasn't
been solved. It relies on random sampling in the initialization phase. Hence, small
clusters are likely to be missed.
Z
X 1
X 2
Y
b
a
X
Fig. 9.2.
Manhattan Segmental Distance
ORCLUS (arbitrarily ORiented projected CLUSter generation) [2] uses arbi-
trarily projected subspaces for finding clusters due to the fact that real data often
contains inter-attribute correlations, which leads to projections that are not paral-
lel to the original axis system. It also asks for two user parameters, the number
of clusters k and the cardinality of the dimensions for each cluster l . ORCLUS
modifies the PROCLUS algorithm by adding a merging process of clusters and
asks each cluster to select principal components instead of attributes. It improves
PROCLUS in that it can construct clusters in arbitrarily aligned subspaces of lower
dimensionality. However, ORCLUS requires all projected clusters to exist in the
same number of dimensions and it also relies on random sampling in the initial-
ization phase. Moreover, like CLIQUE and PROCLUS, it still needs some user
parameters though the guidance in finding a good value of l has been proposed in
this method.
9.3. The IPROCLUS Algorithm
Our algorithm, IPROCLUS, is based on PROCLUS. It takes the number of clusters
k and the average number of dimensions l in a cluster as inputs. It has three
phases: an initialization phase, an iterative phase, and a cluster refinement phase.
Compared to PROCLUS, we propose the modified Manhattan segmental distance
that is more accurate and meaningful in projected clustering. We add one more
Search WWH ::




Custom Search