Biomedical Engineering Reference
In-Depth Information
every metric that is defined. Using the correct classification procedure would
prevent over-fitting later.
Defining metrics achieves two important goals. First, relevant information
is retained in a small number of measures relative to the number of points
in a spectrum (dimensionality reduction). Second, information captured by
metrics is spectroscopically/morphologically significant and, therefore, pro-
vides a framework to interpret results of subsequent classifications. Choosing
metrics with this prior scientific knowledge allows for maximal leveraging of
our knowledge base. At the same time, however, it does not preclude discov-
ery of new features as those could easily be included and determined to be
significant or not. Many of the classification techniques are based on linear
transformations and metrics are a very good way of including spectroscop-
ically significant nonlinear information (ratios of peaks, center of gravity of
peaks, peak positions, etc.) that is otherwise dicult to incorporate. While
it may appear that the superset of metrics is better attained via computer
algorithms, we argue that the same algorithms are actually written by expert
spectroscopists. Hence, the manual nature of examination proposed here has
all the benefits of detecting features. It is variable, however, depending on the
practitioner. Hence, thorough and meticulous efforts must be undertaken -
which are time consuming. As an example, metrics for the prostate study
reported here were developed over the period of a year.
After pre-processing, the various tissue classes in the data are identified
by comparing it with a gold standard (hematoxylin and eosin-stained tissue
analyzed by an expert pathologist). Regions corresponding to different classes
are marked and spectral data are extracted from these regions. The entire data
set consists of two independent parts, namely the training set and the testing
set. The training set is used to extract information and learn characteristics of
the metric data as it relates to the pre-defined classes. The testing set is used to
quantify the accuracy of the classifier and provides the platform for validation.
Among the various classification techniques that could be used, a Bayesian
classifier was selected because of the comprehensive nature of available data
and histologic categories or classes that are defined by clinical practice. Since
the number of data points that we have is large ( > 1 million), estimating pdfs
of different classes become facile. Therefore a Bayesian classifier is a good
choice.
The results demonstrate a high degree of correlation between the gold
standard and predicted data (Fig. 8.12). Evaluation of the quality of predic-
tion can be performed by examining the images or, more rigorously, using
receiver operating curve (ROC) analysis. The last step is to verify that the
accuracy obtained is sucient for use in the clinic. It could be argued that
choosing the correct metrics is more important than the specific classification
technique used. If the metrics have inherent differences (high variance and
high separability), the use of different classification techniques would bring
out these differences; however, no classification algorithm would work for a
Search WWH ::




Custom Search