Audio Recognition - Intelligent Audio Analysis

Digital Signal Processing Reference

In-Depth Information

algorithms. The choice of learning algorithms for these two levels is often based

on experience and exploration, as a full comprehension is still missing in the liter-

ature. However, statistical classifiers, DTs, and SVMs as introduced previously can

be reasonably combined on level-0 [ 46 ]. In contrast, these seem to be less suited on

level-1, where mostly Multiple Linear Regression (MLR) is chosen. MLR is different

from simple linear regression only by use of multiple input variables. In the case of

regression, confidences P k , i (

) ∈[

;

]

are assumed per base learner k

,...,

K ,

and each class i

M . If the level-0 classifier k only decides for exactly one

class i without provision of its confidence, i.e.,

,...,

y k =

i , the level-1 decision by MLR

is as follows:

0 f

y k (

) =

P k , i (

) =

(7.79)

1e se

Applying non-negative weighting coefficients

α k , i per class and learner, the com-

putation of the MLR per class i is obtained by:

MLR i (

) =

1 α k , i P k , i (

(7.80)

During the recognition phase the class i with the highest MLR i (

)

is chosen for

an observed unknown feature vector x , i.e., the decision

y is:

arg max i MLR i (

(7.81)

α k , i thus shows a high confidence in the performance of learner

k for the determination of class i [ 40 ]. For the determination of the coefficients

A high value of

α k , i

the Lawson- and Hanson method of the least squares can be used, which will not

be described here. The optimisation problem to be solved results per each learner

K in the minimisation of the following expression, in which j represents

the index of the training sub-set of the J -fold cross-validation:

,...,

1 (

y l −

1 α k , i P k , i , j (

))

(7.82)

In [ 45 ] it is shown that the meta-classification on the basis of the actual confidences

of the level-0 learners results in an improvement in the majority of cases as opposed

to Eq. ( 7.79 ). This is known as StackingC—short for Stacking with Confidences [ 46 ].

In [ 45 ] a description on obtaining confidence values for diverse learners is given.

Simpler alternatives use either an unweighted majority vote or one based on the

mean confidences. This can also be applied in the case of regression.

Overall, ensemble learning linearly increases the computation effort. Whereas

Bagging and Stacking methods can be distributed on several CPUs for parallelisation,

this is not possible in the iterative Boosting process. The lowest error rate is usually

Intelligent Audio Analysis

Search WWH ::

Custom Search

Home