Biomedical Engineering Reference
In-Depth Information
-classifier that significantly
reduces the time to complete a classification analysis by efficiently distributing the
computational work to many processors.
In this section, we present a parallel version of the
σ
11.3.1 Design of
-Classifiers and Feature Selection
σ
When designing classifiers, we often do not know which features are required. There-
fore, the selection of good features is important in addition to the design of specific
classifiers. A classifier design method should provide a reasonable estimation of error
for each classifier relative to other classifiers, to help find the desired features. If
the number of samples available for analysis is very limited, then error estimation
for the classifiers becomes difficult. To alleviate these problems, the
-classifier is
designed from a probability distribution resulting from spreading the mass of the
sample points via a circular distribution to make classification more difficult, while
maintaining sample geometry. The algorithm is parameterized by the variance of the
circular distribution. By considering increasing variances, the algorithm finds feature
sets whose classification accuracy remains strong relative to greater spreading of the
sample. The error then gives a measure of the strength of the feature set as a function
of the variance. The
σ
-classifier designs classifiers and estimates errors analytically
to minimize the computational load. This property is crucial because of the immense
size of the feature space that will be searched.
An exhaustive search of combinatorial space results in the best feature sets for a
σ
-classifier. This approach has been successfully applied to a few sets of microarray
data of reasonable size containing a few thousand genes. Even though the
σ
-classifier
algorithm is designed for this type of search, as the number of features increases,
the computational load increases significantly, often becoming computationally pro-
hibitive. If n be the total number of features and k be the number of features in a
classifier, then there are M
σ
n
k
classifiers to design and
=
-errors to estimate. Even
with reasonably sized data, n being larger than a few thousand, M may be so large
that it is not feasible to perform an analysis on a single CPU. Therefore, parallel
processing becomes inevitable.
σ
11.3.2 Parallel Implementation of the
-Classifier
σ
When designing a parallel implementation, evenly distributing the computational
work among the processors reduces the time that some processors are idle, while
other processors are doing their work. The most efficient way to distribute the work
would be to take the total number of classifiers and divide them equally among the
processors. However, the number of classifiers quickly exceeds the largest signed
32-bit integer. Therefore, with three features per classifier, only 2345 features would
be possible. Using unsigned 32-bit integers does not solve the problem, as only 2954
features would be possible. Although using 64-bit integers is an option, we preferred
a simple, sub-optimal method for distributing the work, one that does not depend on
such system constraints and remained surprisingly efficient for our study.
Search WWH ::




Custom Search