Digital Signal Processing Reference
In-Depth Information
algorithms. The choice of learning algorithms for these two levels is often based
on experience and exploration, as a full comprehension is still missing in the liter-
ature. However, statistical classifiers, DTs, and SVMs as introduced previously can
be reasonably combined on level-0 [ 46 ]. In contrast, these seem to be less suited on
level-1, where mostly Multiple Linear Regression (MLR) is chosen. MLR is different
from simple linear regression only by use of multiple input variables. In the case of
regression, confidences P k , i (
x
) ∈[
0
;
1
]
are assumed per base learner k
=
1
,...,
K ,
and each class i
M . If the level-0 classifier k only decides for exactly one
class i without provision of its confidence, i.e.,
=
1
,...,
y k =
ˆ
i , the level-1 decision by MLR
is as follows:
0 f
ˆ
y k (
) =
x
i
P k , i (
x
) =
(7.79)
1e se
.
Applying non-negative weighting coefficients
α k , i per class and learner, the com-
putation of the MLR per class i is obtained by:
K
MLR i (
x
) =
1 α k , i P k , i (
x
).
(7.80)
k
=
During the recognition phase the class i with the highest MLR i (
x
)
is chosen for
an observed unknown feature vector x , i.e., the decision
y is:
ˆ
y
ˆ
=
arg max i MLR i (
x
).
(7.81)
α k , i thus shows a high confidence in the performance of learner
k for the determination of class i [ 40 ]. For the determination of the coefficients
A high value of
α k , i
the Lawson- and Hanson method of the least squares can be used, which will not
be described here. The optimisation problem to be solved results per each learner
k
K in the minimisation of the following expression, in which j represents
the index of the training sub-set of the J -fold cross-validation:
=
1
,...,
J
L
M
2
1 (
y l
1 α k , i P k , i , j (
x
))
.
(7.82)
j
=
1
l
=
i
=
In [ 45 ] it is shown that the meta-classification on the basis of the actual confidences
of the level-0 learners results in an improvement in the majority of cases as opposed
to Eq. ( 7.79 ). This is known as StackingC—short for Stacking with Confidences [ 46 ].
In [ 45 ] a description on obtaining confidence values for diverse learners is given.
Simpler alternatives use either an unweighted majority vote or one based on the
mean confidences. This can also be applied in the case of regression.
Overall, ensemble learning linearly increases the computation effort. Whereas
Bagging and Stacking methods can be distributed on several CPUs for parallelisation,
this is not possible in the iterative Boosting process. The lowest error rate is usually
 
Search WWH ::




Custom Search