A Likelihood-Maximizing Framework for Enhanced In-Car Speech Recognition Based on Speech Dialog System Interaction - Digital Signal Processing for In-Vehicle Systems and Safety

Digital Signal Processing Reference

In-Depth Information

resulting enhancement parameters being suboptimal since optimization is

performed on the wrong state models. In turn, suboptimal enhancement parameters

could lead to further decreases in accuracy in the subsequent decoding state. This

effect is particularly likely when the number of incorrectly labeled frames is greater

than the correctly labeled frames, as may be the case in high-noise conditions.

10.2.2.3 Proposed Dialog-Based LIMA Framework

Having identified problem with the existing LIMA frameworks, we propose to

exploit a confirmation-based speech dialog system to drive optimization. Dialog

systems requiring users to verify commands with simple “Yes/No” replies are a

well-established mechanism in voice recognition applications. A block diagram of

the proposed framework within the dialog exchange is shown in Fig. 10.1 .

This system mimics the calibrated and unsupervised frameworks by performing

an initial decode using default enhancement parameter values in the feature extrac-

tion stage. This framework differs from previous work following the initial ASR

pass. Instead of immediately performing optimization, the hypothesized word

sequence is first verified through the grounding process which is required in the

dialog system in order to detect any misrecognition errors which need to be

corrected prior to executing a desired action such as determining route navigation.

Since it is cumbersome for the dialog manager to request confirmation from the

user after each response, grounding often occurs once the dialog systems have

gathered a number of pieces of information, for example the suburb, street name,

and number of a destination address. In the case where the user states the informa-

tion is incorrect, the dialog manager will attempt to recover from these errors by

either asking for corrections to specific information or restarting the dialog transac-

tion. In this instance, the enhancement parameters remain unaltered.

It is also possible to incorporate knowledge of the state of the car environment to

alter the enhancement parameters should the noise condition change drastically be-

tween optimizations. The purpose of this chapter is not to suggest how this should be

done but to analyze the performance of existing and proposed LIMA frameworks and

make recommendations on how these are best utilized in automotive environments.

When the user confirms the information to be correct, this affirmation is fed back

to the dialog manager for further processing (e.g., a call to an external information

source such as the navigation system) but also triggers the optimization of the

enhancement parameters. In order to interface the optimization process with

the grounding procedure, it is required to store the user responses as well as the

hypothesized state sequences - this is reflected in Fig. 10.1 . On confirmation, this

stored information is used in the optimization process; if rejected, the stored state

sequence is therefore unreliable, and so, the memory can be cleared in preparation

for responses in the error-recovery stage.

The primary advantage of the proposed dialog-based LIMA framework is that

optimization never takes place on inaccurate transcription hypotheses, which

overcomes the limitation of the unsupervised framework. Another advantage is the

Search WWH ::

Custom Search

Home