Biology Reference
In-Depth Information
a particular selection problem should first address the issue of choosing a
reasonable starting solution, which is always a big obstacle for an inexpe-
rienced operator. To find the true global optimized solution for a gene
selection problem, one needs to solve an array of interlinked multi-dimensional
simultaneous equations. For a gene subset with more than just a few gene
elements, until recently this has been a very difficult task, requiring the
use of a supercomputer and highly skilled programming. With the help of
SDL global optimization, however, there is no need to solve equations.
The global optimized solutions can be found at an affordable computing
cost through orthogonal sampling.
It is worth observing that the established multi-subset class predictor
could be reduced in size by removing the first five or more unstable short
gene subsets; the remaining subsets would still perform well, as shown
on the supporting website (Li, 2006). In general, the predicting strength
may be improved. However, having those genes selected in the short sub-
sets included may be significant to biologists, as they could well be
informative.
Another interesting observation is that there are not many genes which
each plays a more important role than any other gene. The most frequently
appearing genes involved in the colon cancer predictor were 249 and 164,
which appeared 10 times and eight times, respectively. Most of the genes
in the predictor were selected only once. For the leukemia predictor, the
situation is quite similar. Genes 2642 and 4050 were the most frequently
used genes, being included 16 times each. Both gene IDs assigned by this
study and real gene accession numbers from the original datasets are listed
in Tables 5.11 and 5.12, respectively. The gene appearance frequency for
the colon class predictor is also given in Table 5.11.
Some previous research works proposed to find out many near-optimal
gene subsets through a well-tuned GA procedure and pick up the top
50-200 most frequently appearing genes to construct a long gene subset
as a predictor (Li et al ., 2001b). Although the performance of such a
predictor was reasonably good, the large amount of computation might
not be affordable or cost-effective and might not be necessary. One more
experiment was quickly carried out by forming a subset with the seven
most frequently appearing genes identified from the colon cancer predic-
tor; they are genes 249, 164, 2000, 245, 567, 66, and 581. Using such
Search WWH ::




Custom Search