Biology Reference
In-Depth Information
5.3.6. Validation (predicting through a voting mechanism)
The established multi-subset class predictor is validated with the testing
datasets for both the colon cancer and the leukemia data. Each gene
subset in the predictor predicts the class of every sample in the testing
datasets independently according to the same KNN rules ( k
5) used in
the training stage. The predicted class code (in colon cancer data: 1 for
normal,
=
1 for tumor, 0 for unknown; and in leukemia data: 1 for ALL,
1 for AML, and 0 for unknown) is assigned to the particular sample
accordingly. Each single class code is treated as a single vote. For each
sample in the testing datasets, up to 23 votes contributed by 23 gene sub-
sets in the predictor can be obtained. The final class predicted by the pre-
dictor depends on the sign of the sum of the 23 votes of the sample under
test. A positive sign indicates that there are more gene subsets in the pre-
dictor vote for class 1 (normal for colon cancer and ALL for leukemia),
and the sample is finally classified as 1 by the multi-subset predictor.
A negative sign indicates that there are more gene subsets in the predictor
vote for class
1 (tumor for colon cancer and AML for leukemia), and
the sample is finally classified as
1 by the multi-subset predictor. When
the sum is 0, there are equal numbers of gene subsets among the 23 gene
subsets for class 1 and class
1; in this case, the corresponding sample
should be classified as 0 (unknown or unclassified). It is not difficult to
interpret the actual values of the classification results. The absolute value
of the sum of the 23 votes should indicate the predicting strength. The
larger the value is, the more confident the prediction is.
5.4. Experimental Results
A Microsoft Windows-based computer program with a user-friendly
graphical interface has been written. The entire experimental computation
was carried out on a personal laptop computer (1.7 GHz Intel Pentium
Pro/II/III). The software can be downloaded from http://www.scis.ecu.
edu.au/dli/ (Li, 2006) and is available free to researchers. Both the colon
cancer and the leukemia samples were classified 100% correctly. The
classification processes are automated after the gene expression data
are inputted. It can find the global optimum solutions and construct
Search WWH ::




Custom Search