Digital Signal Processing Reference
In-Depth Information
Boosting or Arcing [ 42 ] introduces a weighting for the voting (or averaging)
process. Weights are chosen indirectly proportional to the error probability in order
to emphasise the 'difficult cases' [ 43 ]. An option of realising weighting is to sam-
ple these instances repeatedly according to the weight. By that, the construction of
ensembles follows an iterative procedure wherein the observed error probabilities
are chosen by individual learning algorithms. Usually, one obtains better results as
in Bagging, however, downgrades may also occur [ 5 ]. In any case, the computational
effort is higher owing to the iterative procedure. One of the most popular Boosting
algorithms is Adaptive Boosting, or AdaBoost for short. Adaptive refers to the itera-
tive focus on the cases producing errors. Let x l ,
l
=
1
,...,
L be the feature vectors
in the training set
. As for SVMs or DTs, the original AdaBoost algorithm
is suited only for two classes. The variant AdaBoost.M1, however, is an extension
suited multiple classes M . By that, the class assignment for the instance x l
L
, L
=| L |
is given
by y l ∈{
1
,...,
M
}
. To each instance x l L
weights
w l are assigned. These are all
initialised as
L and are—as indicated—considered during computation of a
weighted error measure and the training of the classifier. The core of the algorithm
is now the determination of the weights
w l =
1
/
β t for the classifier with index t , where
β t
depends on the error probability
ε t of the classifier.
Given the training set
L
and a number T of time steps t
=
1
,...,
T the following
steps are carried out:
1. A classifier with the decision
y t
ˆ
: X →{
1
,...,
M
}
is trained on
L
considering
w l . As indicated, this can be realised by sampling a sub-set according
to the weights as probability distribution.
2. The weighted classification error
the weights
ε t is computed:
ε t =
y l w l .
(7.74)
l
y t , l =
3. If
2, then repeat steps 1-3; terminate after N repetitions.
4. Else compute classifier
ε t >
1
/
β t as
10 10
ε t =
if
0
β t =
,
(7.75)
ε t
else
1
ε t
where the constant 10 10
is arbitrarily chosen to avoid division by zero in
Eq. ( 7.77 ) below.
5. If
l
ε t =
0, then the new weights
w
as used in the following iterations result in:
w l β t
if
y l =
ˆ
y l
l
w
=
(7.76)
w l
else
.
l
6. The weights
w
are normalised for their sum to be one.
The decision
y Ada of the ensemble classifier is then
ˆ
 
 
Search WWH ::




Custom Search