Biology Reference
In-Depth Information
a multi-subset class predictor containing up to 23 gene subsets based on a
given microarray gene expression data collection, such as the colon can-
cer or leukemia data, within a period of several hours.
For the convenience of computation, every gene was assigned a unique
integer ID number (from 1 to 2000 for colon, and from 1 to 7129 for
leukemia) according to the order in their original datasets. The aim was to
study how changes in the choices of various gene element variables for a
gene subset with a given length affect a response variable (success rate in
classifying training samples). For each of the gene elements that are used
to form a gene subset, 11 choices (levels) were selected for inclusion in
the OA sampling based on L 242 (11 23 ). Those 11 choices of gene IDs were
generated by the formula (the length of the current search space divided
by 10) at an equal distance. Some shifting on the selected gene IDs was
necessary to avoid having any repeating genes in a single gene subset. A
total of 242 subsets were evaluated with the objective function in the cur-
rent iteration, and the 10% top-performing subsets were used to reduce the
search space. Only two top-performing gene subsets were passed to the
next iteration.
Within the search space of 2000 genes for the colon cancer data
and 7129 genes for the leukemia data, SDL global optimization found
23 optimal gene subsets with different lengths, from 1 gene to 23 genes,
that formed two pyramidal hierarchy class predictors, respectively (see
Tables 5.7 and 5.9). Those gene subsets were assumed to be the best-
performing gene combinations for classifying the gene datasets used in
this study. The selected gene subsets were then used to classify the test
samples in both the colon and leukemia datasets. Tables 5.8 and 5.10 show
the classification results. Once the validation of all 23 optimal gene sub-
sets was completed, the proposed multi-subset voting mechanism was
adopted. One of the classification results (1 for class 1,
1 for class 2, and
0 for unclassified) was obtained by balancing the votes from the 23 gene
subsets for the particular testing sample of interest. It is a process of
counting votes to make a final decision on the class of the sample under
test. For example, sample N28 in the colon dataset (shown in Table 5.8)
receives 23 votes in total from the 23 gene subsets in the class predictor.
Among the 23 votes, there are 11 votes of normal, 6 of tumor, and 6 of
unknown. The class code (1 for normal,
1 for tumor, and 0 for unknown)
Search WWH ::




Custom Search