Biomedical Engineering Reference
In-Depth Information
the symbols are confined to their appropriate tissue type circle segment. This will be the
case when a particular clinical sample or point is correctly classified. Only a few individ-
ual tissue samples do not fall into the correct circle segment (filled circles mostly). That
only a few of these samples are misclassified reflects the high degree of accuracy—a 96%
correct assignment to known clinical cancer class—we achieved using this three-class clas-
sifier on this microarray biosensor-derived clinical cancer data set. Similar high accuracies
were obtained by applying the same techniques to a two-class prediction of lymphoma
cell subtypes based upon analysis of microarray biosensor experiments (180). This exam-
ple clearly demonstrates the power of supervised machine learning approaches to be
trained to correctly classify complex biosensor input from known class examples. The
trained algorithm can then be applied to yield accurate output predictions for data from
new samples.
1.4.3
Applying Supervised Machine Learning to the NCI Compounds' Effects on Cancer
Cells
In the second example applying a supervised machine learning approach, we examined
the output from the National Cancer Institute's internal testing program involving tens
of thousands of compounds in their cancer compound library being tested against 60
cancer cell lines grown in tissue culture. For each of the 60 cancer cell types, the NCI
determined each compound's GI50 value—the compound's concentration needed to
achieve 50% growth inhibition of that cell type. These 60 cancer cell lines were selected
as representatives from nine clinically different cancer tissue types. In this data mining
example, we examined the 60 tests from each of the 1400 different compounds (181). This
restricted compound subset of the total NCI library resulted from the fact that significant
numbers of missing values were present for most of the compounds in the larger com-
pound library. An example of our approach is exemplified in Figure 1.53, where we pres-
ent the RadViz statistical classification output from a three-class classifier. This output
shows that we identified small subsets of the 1400 compounds whose GI50 inhibition test
data were most highly effective at discriminating between or separating the three cancer
cell line classes—the melanoma cell line class (14 compounds), the leukemia cell line class
(30 compounds), and the nonmelanoma, nonleukemia cell line class (8 compounds). The
NSC identification numbers of these effective compounds are shown arrayed around the
circumference at their specific positions within the RadViz sections corresponding to
these three tissue class types. It was these NSC compounds' data, vectorially summed,
that were used to position the tested cell lines (points) within the RadViz display.
Interestingly, in this study we discovered a pattern to the chemical character of the two
compound subsets best at mutually discriminating melanoma cells from leukemia cells
from nonmelanoma, nonleukemia cells. For the 14 compounds most effective against
melanoma, 11 were identified as substituted p -quinones and all 11 had an internal
quinone ring structure of the type shown in Figure 1.54. In contrast, of the 30 compounds
most effective against leukemia, eight were identified as substituted p -quinones and six
of the eight had an external p -quinone ring structure as we illustrate in Figure 1.54. The
discovery of the quinone compounds' cancer type specificity and its implications are
thoroughly discussed elsewhere (177,181).
This example analyzing data on chemical compounds, coupled with the previous exam-
ple analyzing microarray biosensor data, illustrates the power and broad applicability of
machine learning approaches to discover interesting patterns in any kind of data, includ-
ing complex biosensor inputs. In fact, the discoveries we described would not have been
possible by traditional analysis approaches. Although these particular examples contained
Search WWH ::




Custom Search