Biology Reference
In-Depth Information
Fig. 16.4. Comparison of the difference between error sums from the clustering of 5652 yeast genes
based on DNA expression levels in glucose pathway experiments, using different clustering algorithms.
Each gene contains 36 time points, or a 24-dimensional feature vector. This comparison allows an
overview of the extent of overall 'error-ness' for the clusters formed and should be minimized.
16.3.4. Difference between Intra-cluster and Inter-cluster Error Sums
We look also at an overall measure of clustering quality the difference between
the intra-cluster and inter-cluster error sums. Since it is desirable for the former to
be minimized and the latter maximized, an effective and rigorous clustering algo-
rithm will have a low value for this difference. The results are shown in Fig. 16.4.
Again, the EP GOS Clust is the best performer except for certain regions of low
cluster number, where the K-Medians dominate with its capability to handle out-
lier data points. At higher cluster numbers however, the EP GOS Clust identifies
and isolates these outlier data points into new clusters and subsequently its clus-
tering performance overtakes that of the K-Medians.
16.3.5. Optimal Number of Clusters
We compute the optimal number of clusters by applying a suitable weight-
ing factor to the two error sums and then finding the clustering balance. The
EP GOS Clust predicts the lowest number of optimal clusters. From Fig. 16.5, it
can be seen that EP GOS Clust predicts 237 clusters. On the other hand, K-Means
and KCorr, for instance, predict the optimal number of clusters to be around 700,
while K-Medians puts the number at around 450. Together with the quality of the
EP GOS Clust from the previous comparisons, we infer the superior 'economy'
of the EP GOS Clust in producing tighter data groupings by utilizing a lower
number of clusters, as it is actually possible to achieve tight groupings by using a
large number of clusters, even with an inferior clustering algorithm.
Search WWH ::




Custom Search