Biology Reference
In-Depth Information
There are several strategies to avoid missing the global optimum when
seeking the minimum solution. Among these, the most important step is
to select or design a suitable OA with which the function within domains
can be repeatedly sampled. The algorithm is automatically constrained to
stay within the function domain and will not request function evaluations
outside this domain.
There are two stopping criteria possible: either when the target objec-
tive function value is reached, or when the maximum domain length is
smaller than the user-selected value. In this research, one uses the latter
stop criteria, corresponding to the variation possible for each gene ele-
ment in the subset — which can be as little as one gene. This means that
the global minimum has been found for a particular gene selection range
of each gene element, with a variation of less than one gene for each gene
element. Strictly speaking, then, the global optimum is not defined at a
point, but as lying within a region.
5.3.5. Multi-subset class predictor
Although SDL optimization will result in an optimal gene subset with
a given length, the classification performance varies. It seems that for
both the colon cancer and leukemia datasets, there is no guarantee of
naming a single gene subset that is capable of classifying all of the
samples in the testing set correctly. It is observed that gene subsets with
different lengths tend to misclassify or unclassify the different samples
in the test datasets. In other words, gene subsets with the same length
will always misclassify a few same samples in the test datasets,
although those are all the optimal subsets identified by optimization
procedures. This indicates that the key factor to improve the signal-to-
noise ratio in classifying very noisy data, such as microarray gene
expressions, is the length of the gene subset. Based on the above obser-
vation, a multi-subset class predictor was constructed for classification
by using all 23 optimal gene subsets with the lengths from 1 to 23
genes. The maximal number of genes involved in the predictor is 276
in total. As some of the genes may appear more than one time, the
actual number of the unique gene IDs is a bit less and varies from case
to case.
Search WWH ::




Custom Search