Digital Signal Processing Reference
In-Depth Information
The multimodal identification results are shown in Tables 16-3 and 16-4,
where we observe the equal error rates at varying levels of acoustic noise.
Table 16-3 displays the equal error rates obtained for audio-lip fusion, which
is based on concatenative data fusion and two-stream HMM structure.
Although the performance figures for the audio-lip streams do not convey
significant improvement and stay well under audio-only performances, they
bring some independent information to the decision fusion using audio-lip
correlations, especially under environmental noise. Decision fusion results
are presented in Table 16-4, where summation, WTAll and Bayesian types of
decision fusion techniques are evaluated.
Decision fusion techniques significantly improve identification rates, as
they address the independence between different modalities. It is clear in
Table 16-4 that the summation rule suffers under low SNR conditions.
Although the weights in the summation rule are picked to be optimal with the
existing modality reliabilities, the poor improvement under noise is mainly
due to the variations of the reliabilities under adverse environmental
conditions. However, WTAll decision fusion favors the confident modality,
and performs better for low SNR conditions. On the other hand, multilevel
Bayesian decision fusion favors confident enough modality if it stands higher
in the reliability ordering, which introduces further improvement over all
SNR conditions. In the multilevel Bayesian fusion, the reliability ordering of
the modalities decreases from left to right. For example, in the Bayesian
Search WWH ::




Custom Search