Digital Signal Processing Reference
In-Depth Information
resulting enhancement parameters being suboptimal since optimization is
performed on the wrong state models. In turn, suboptimal enhancement parameters
could lead to further decreases in accuracy in the subsequent decoding state. This
effect is particularly likely when the number of incorrectly labeled frames is greater
than the correctly labeled frames, as may be the case in high-noise conditions.
10.2.2.3 Proposed Dialog-Based LIMA Framework
Having identified problem with the existing LIMA frameworks, we propose to
exploit a confirmation-based speech dialog system to drive optimization. Dialog
systems requiring users to verify commands with simple “Yes/No” replies are a
well-established mechanism in voice recognition applications. A block diagram of
the proposed framework within the dialog exchange is shown in Fig. 10.1 .
This system mimics the calibrated and unsupervised frameworks by performing
an initial decode using default enhancement parameter values in the feature extrac-
tion stage. This framework differs from previous work following the initial ASR
pass. Instead of immediately performing optimization, the hypothesized word
sequence is first verified through the grounding process which is required in the
dialog system in order to detect any misrecognition errors which need to be
corrected prior to executing a desired action such as determining route navigation.
Since it is cumbersome for the dialog manager to request confirmation from the
user after each response, grounding often occurs once the dialog systems have
gathered a number of pieces of information, for example the suburb, street name,
and number of a destination address. In the case where the user states the informa-
tion is incorrect, the dialog manager will attempt to recover from these errors by
either asking for corrections to specific information or restarting the dialog transac-
tion. In this instance, the enhancement parameters remain unaltered.
It is also possible to incorporate knowledge of the state of the car environment to
alter the enhancement parameters should the noise condition change drastically be-
tween optimizations. The purpose of this chapter is not to suggest how this should be
done but to analyze the performance of existing and proposed LIMA frameworks and
make recommendations on how these are best utilized in automotive environments.
When the user confirms the information to be correct, this affirmation is fed back
to the dialog manager for further processing (e.g., a call to an external information
source such as the navigation system) but also triggers the optimization of the
enhancement parameters. In order to interface the optimization process with
the grounding procedure, it is required to store the user responses as well as the
hypothesized state sequences - this is reflected in Fig. 10.1 . On confirmation, this
stored information is used in the optimization process; if rejected, the stored state
sequence is therefore unreliable, and so, the memory can be cleared in preparation
for responses in the error-recovery stage.
The primary advantage of the proposed dialog-based LIMA framework is that
optimization never takes place on inaccurate transcription hypotheses, which
overcomes the limitation of the unsupervised framework. Another advantage is the
Search WWH ::




Custom Search