Audio Recognition - Intelligent Audio Analysis - page 120

Digital Signal Processing Reference

In-Depth Information

ASE 2

Audio Event 3

ASE 1

ASE 3

Audio Event 2

ASE 2

ASE 1

ASE 2

Audio Event 1

ASE 1

t

Fig. 7.13 Viterbi search of the optimal audio event sequence, Trellis diagram for the hierarchical

recognition of audio events that consist of audio sub-events (ASE). The backtracking path is shown

over time, and squares represent feature vector observations. HMMs (one per ASE) are shown

schematically in Bakis topology. After backtracking the sequence of audio events 2, 1, 3 is recognised

current step in time. By that, the 'beam width' is broadened or narrowed according

to the validation of the concurring paths' ascent or decline. This width is decisive for

the trade-off between higher accuracy (broadened width) and higher speed (narrowed

width).

Subsequently, at audio event transitions the value of the LM is added in the

computation and it is jumped to the first state of the first model of the new audio

event. In addition the required back-tracking information is stored.

Finally, the best audio event sequence is obtained at its end by the usual back-

tracking, and the recognised audio events and their boundaries are output.

In practical applications, this particularly efficient search algorithm can reach

reductions of the number of states to be computed of 1:1 000 [ 36 ]. The overall

approach integrates knowledge of information on different levels in hierarchy to

avoid early wrong decisions.

7.4 Ensemble Learning

Up to now, a number of learning algorithms was presented. In order to benefit from

diverse advantages of these, one can aim at a synergistic heterogeneous combina-

tion of these. Alternatively, or in addition, homogeneous combination of the same

Next Page

Intelligent Audio Analysis

Search WWH ::

Custom Search

Home