Biology Reference
In-Depth Information
Table 16.2. Gene Ontology comparison between clusters found by different clustering approaches.
The table compares the log 10 ( P ) values of the clusters, which reflect the level of annotative richness,
as well as the proportion of yeast genes that fall into biologically significant clusters. The latter is
important in 'presenting' the maximal amount of relevant genetic information for follow-up work in
areas such as motif recognition and regulatory network inference. The shaded row contains the results
for EP GOS Clust and the top three performers for each performance indicator is marked with an
asterisk.
results, 91% of the genes group into clusters with p-values under 0.01 and 87% of
the genes fall into clusters with p-values under 0.005, which is a significant indica-
tion of clustering quality. Table 16.2 shows that the EP GOS Clust performs well
against other clustering algorithm in obtaining clusters with good overall p-values
(expressed as
log 10 ( p ) values in this table) and the proportion of genes that are
placed into significantly coherent clusters, which we consider to be two broad
tenets in assessing the strength of biological coherence. We would like to point
out that the EP GOS Clust procedure isolates errant data points as the clustering
progresses. Thus, in further analysis of the clusters we have good justification to
consider these data points as being irrelevant.
16.3.7. Additional Constraints for Large Datasets
It is interesting to note that a close examination of the clustering results within
each GOS iteration reveals that the cluster size distribution does not change sig-
nificantly over successive iterations This suggests that we can analyze the inter-
mediate results from a particular run and introduce additional constraints on the
number of clusters allowable in each size class without significantly compromis-
ing on the optimality of the final solution. Using 1
j =1 w ij
n
c +1as the
 
Search WWH ::




Custom Search