Machine Learning Techniques for Large Data - High-Throughput Image Reconstruction and Analysis

Biomedical Engineering Reference

In-Depth Information

previous support vectors is reduced. The support vectors with the lowest weight

are discarded to limit the memory required for the algorithm.

The last approach to classification we discuss in this chapter is based on com-

bining multiple classifiers to form a single, improved classifier. In bagging , multiple

instances of the dataset are created by drawing subsets of examples from the train-

ing set and building a classifier based on each of these subsets. Each of these

classifiers is known as a component classifier. Classification of a test example is

performed by taking the majority vote of all the component classifiers [46]. Bag-

ging can be parallelized trivially by training the component classifiers on different

nodes [64].

Boosting is a more sophisticated approach to combining classifiers. Here, a

simple (or weak) classifier is trained on the data, as shown in Algorithm 3. Then,

those points of the train-set that are incorrectly classified are identified and given a

higher weight. Another weak classifier is trained on the weighted data to improve

the classification of these incorrectly labeled points. This process is repeated until

a sufficiently low training error is reached.

The training of the weak classifiers can be performed by either drawing points

from the training data proportionally to error or by selecting a subset of the in-

correctly trained points. The former algorithm is known as AdaBoost [65], which

is arguably the most popular boosting algorithm. The latter algorithm is the local

boosting algorithm [66].

Several approaches exist for training boosting classifiers with large data. Online

[67] and semi-online (also known as filtering) methods [68] have been demon-

strated in the literature. In the latter, it is assumed that a very large number of

examples are provided to the learner, one at a time. Instead of sampling a fixed

dataset according to a distribution as in Algorithm 3, each new example is ac-

cepted with a probability relative to this distribution. Once enough samples have

been accepted, a weak learner is trained and the examples are discarded. This pro-

cess continues until enough weak learners have been trained. Thus, very large data

can be handled because only a small chunk is held in memory at any one time.

Similar to bagging, boosting can be parallelized trivially by parallel training of

the weak learner. Alternatively, each node can train a weak learner on a subset of

the data and share the resulting hypothesis with all other nodes [64]. The nodes

Algorithm 3 Pseudo-Code for the AdaBoost Algorithm

1: Given: Training set T , comprised of n patterns x and corresponding labels y

∈{− 1 , + 2 } .

2: Initialize: W 1 (

i

)=

1

/

n

3: for k=1toM do

4: Train weak classifier C k using T sampled according to W i

5:

Estimate E k , the error of C k measured on T sampled according to W i .

Let a k =

/

(( 1 − E k ) / E k )

6:

1

2ln

7:

e − a k

if the ith sample is correctly classified

W k (

i

)

W k + 1

(

i

)=

×

e + a k

Z k

if the ith sample is incorrectly classified

8: end for

High-Throughput Image Reconstruction and Analysis

Search WWH ::

Custom Search

Home