Biology Reference
In-Depth Information
repeated until the search space was small enough (e.g. fewer than 11 genes
left) or the objective function could not be improved any further. The rank
No. 1 gene subset in the last round of optimization was chosen as the opti-
mal solution for the 23-gene subset. The optimization was run 23 times
with different lengths (23, 22, …, 2, 1) of gene subsets at each run; a total
of 23 optimal solutions were obtained. All 23 optimal solutions con-
structed a multi-subset cancer class predictor and were then used to clas-
sify the samples in the test dataset. All 23 gene subsets were arranged to
form a pyramidal layer-by-layer hierarchy, with the shortest subset (one
gene) at the top and the longest subset (23 genes) at the bottom (see Table
5.7 and Table 5.9 for details).
5.3.2. Objective function
An objective function is also called a fitness or merit function, which is a
measure of the ability for a selected gene subset to classify the training set
samples according to the SDL optimization procedure. There are several
ways, such as neighborhood analysis (Golub et al ., 1999), support vector
machines (Peng et al ., 2003; Liu et al ., 2005), and k -nearest neighbors
(KNN) (Li et al ., 2001a), to construct an objective function for the opti-
mization and gene selection algorithms. Among them, KNN is used for
the proposed SDL global optimization because it is easy to compute. The
Euclidean distance between a single sample (represented by its pattern
vector V m ) and each of the pattern vectors of the training set containing
M samples is calculated:
( g 1 , g 2 ,…, g n ), where n is the number of genes in the vector that can
be set from 1 to 23 in order to form the gene vectors (or subsets) with
different lengths; g n is the expression level of the n th gene in the m th
sample; m
V m =
=
1, 2, …, M . For the colon cancer dataset, M
=
40; for the
leukemia dataset, M
=
38 .
Each sample is classified according to the class membership of
its KNN as determined by the Euclidean distance in n -dimensional space.
If all or a majority of the KNN of a sample belong to the same class, the
sample is classified as that class; otherwise, the sample is considered
Search WWH ::




Custom Search