Biology Reference
In-Depth Information
Fig. 16.3. Comparison of inter-cluster error sum from the clustering of 5652 yeast genes based on
DNA expression levels in glucose pathway experiments, using different clustering algorithms. Each
gene contains 36 time points, or a 24-dimensional feature vector. The inter-cluster error sum measures
the extent of dissimilarity between clusters, and should be maximized.
of probing for the optimal number of clusters using QTClust uses clusters of in-
consistent qualities. We further look in detail at the clustering results obtained
by QTClust and note that genes with up to 14 different feature points (60% of all
feature points) are in fact clustered together.
16.3.3. Inter-cluster Error Sum
This error sum indicates how different clusters are from one another and is given
by:
c
s
z jk ) 2 .
( z jk
j =1
k =1
This is another measure of cluster quality, and it is desirable for the error
sum to be maximized. The inter-cluster error sum for the clustering of 5652
genes is shown in Fig. 16.3. Here, the EP GOS Clust outperforms all the other
cluster algorithms. In using the intra-cluster error sum as the objective function
and demanding that the worst-fitting gene be extracted to seed new clusters, the
EP GOS Clust explicitly seeks a minimal intra-cluster error sum and implicitly
searches for a configuration that maximizes the inter-cluster error sum. Note that
while K-Medians does well in obtaining a minimal intra-cluster error sum, it per-
forms averagely in discerning dissimilar clusters. This is due to the 'localized'
nature of the K-family of clustering methods, where there is a tendency to become
'stuck' within a limited vicinity of the initialization point for most data structures.
Search WWH ::




Custom Search