Biomedical Engineering Reference
In-Depth Information
previous support vectors is reduced. The support vectors with the lowest weight
are discarded to limit the memory required for the algorithm.
The last approach to classification we discuss in this chapter is based on com-
bining multiple classifiers to form a single, improved classifier. In bagging , multiple
instances of the dataset are created by drawing subsets of examples from the train-
ing set and building a classifier based on each of these subsets. Each of these
classifiers is known as a component classifier. Classification of a test example is
performed by taking the majority vote of all the component classifiers [46]. Bag-
ging can be parallelized trivially by training the component classifiers on different
nodes [64].
Boosting is a more sophisticated approach to combining classifiers. Here, a
simple (or weak) classifier is trained on the data, as shown in Algorithm 3. Then,
those points of the train-set that are incorrectly classified are identified and given a
higher weight. Another weak classifier is trained on the weighted data to improve
the classification of these incorrectly labeled points. This process is repeated until
a sufficiently low training error is reached.
The training of the weak classifiers can be performed by either drawing points
from the training data proportionally to error or by selecting a subset of the in-
correctly trained points. The former algorithm is known as AdaBoost [65], which
is arguably the most popular boosting algorithm. The latter algorithm is the local
boosting algorithm [66].
Several approaches exist for training boosting classifiers with large data. Online
[67] and semi-online (also known as filtering) methods [68] have been demon-
strated in the literature. In the latter, it is assumed that a very large number of
examples are provided to the learner, one at a time. Instead of sampling a fixed
dataset according to a distribution as in Algorithm 3, each new example is ac-
cepted with a probability relative to this distribution. Once enough samples have
been accepted, a weak learner is trained and the examples are discarded. This pro-
cess continues until enough weak learners have been trained. Thus, very large data
can be handled because only a small chunk is held in memory at any one time.
Similar to bagging, boosting can be parallelized trivially by parallel training of
the weak learner. Alternatively, each node can train a weak learner on a subset of
the data and share the resulting hypothesis with all other nodes [64]. The nodes
Algorithm 3 Pseudo-Code for the AdaBoost Algorithm
1: Given: Training set T , comprised of n patterns x and corresponding labels y
∈{− 1 , + 2 } .
2: Initialize: W 1 (
i
)=
1
/
n
3: for k=1toM do
4: Train weak classifier C k using T sampled according to W i
5:
Estimate E k , the error of C k measured on T sampled according to W i .
Let a k =
/
(( 1 E k ) / E k )
6:
1
2ln
7:
e a k
if the ith sample is correctly classified
W k (
i
)
W k + 1
(
i
)=
×
e + a k
Z k
if the ith sample is incorrectly classified
8: end for
Search WWH ::




Custom Search