Biology Reference
In-Depth Information
We note here that the rightful α -ratio is 0.5. We note that the factor α should
balance the contributive weights of the two error sums to the clustering balance.
At extreme cluster numbers, that is, the largest and smallest number possible, the
sum of the intra-cluster and inter-cluster error sums at both cluster numbers should
be balanced. In the minimal case, all the data points can be placed into a single
cluster, in the case of which the inter-cluster error sum is zero and the intra-cluster
error sum can be calculated with ease. In the maximal case, each data point forms
its own cluster, in the case of which the intra-cluster error sum is zero and the
inter-cluster error sum can be easily found. Obviously the intra-cluster error sum
in the minimal case and inter-cluster error sum in the maximal case are equal,
suggesting that the most appropriate weighting factor to use is in fact 0.5.
This suggest that for any clustering algorithm including that using the GOS
algorithm, one can deduce the optimal number of clusters by performing multiple
repetitions of the clustering process over a suitably large range of cluster numbers
and watching for the clustering gain or clustering balance turning points.
16.2.3. Proposed Algorithm
The discussion thus far points to the GOS formulation as a suitable clustering
algorithm. But for it to be effective, the formulation must be provided with a
good initialization point. Also, we want to expeditiously incorporate the approach
to predict the optimal number of clusters into a clustering algorithm. With these
considerations in mind, we propose the following GOS clustering algorithm with
enhanced data point positioning (EP GOS Clust).
Gene Pre-Clustering: We choose to pre-cluster genes based on the feature
pattern representation of their expression vectors. This conforms well to the in-
tuitive notion that two co-expressed genes similarly-shaped expression patterns,
rather than comparing the magnitudes of the two series of measurements [18]. In
our 24-dimensional expression vectors, only genes with two or less different ex-
pression vector points from one another are pre-clustered together. Many of these
genes end up belonging to more than one pre-cluster. Since their specificmember-
ship is in question, we take the clusters formed only by uniquely-clustered genes.
As a result, we find 388 genes uniquely placed into 157 clusters.
We note here in particular when pre-clustering by finding complete cliques,
which means that the pre-clustered genes that belong uniquely to only one cluster,
or in other words, there is a link between every gene within the same cluster,
we could have iterated the process using various levels of pre-clustering criteria.
When the criterion is overly lenient, a large number of pre-clusters are formed,
Search WWH ::




Custom Search