Digital Signal Processing Reference
In-Depth Information
10.3.4 Likelihood-Maximization Frameworks
The AVICAR database enables analysis of LIMA frameworks based on speaker or
noise calibration as well as a combination of both. The following LIMA frameworks
have been tested:
• Calibrated LIMA framework using optimization on a noise-by-noise basis
• Calibrated LIMA framework using optimization on a speaker-by-speaker basis
under a single, randomly chosen noise condition
• Calibrated LIMA framework using optimization for each speaker in each noise
conditions (i.e., matched conditions)
• Proposed dialog-based LIMA framework without calibration
• Proposed dialog-based LIMA framework with a single calibration utterance in a
random noise condition
• Proposed dialog-based LIMA framework with a single calibration utterance in
the idle noise condition
The unsupervised LIMA frameworks were not assessed in this chapter as the
overall performance of the speech recognizer is low (less than 50% average word
accuracy), making the hypothesis transcriptions (and therefore the optimized
parameters) unreliable.
Each calibrated LIMA framework used a single, randomly generated utterance
treated as adaptation session. For the noise-only calibration framework, a random
utterance from a random speaker was chosen for each experimental fold in the
evaluation protocol. For speaker-based calibration (applied in both calibrated and
dialog frameworks), a single utterance from a random noise condition was used for
each speaker, with the remaining utterances ordered randomly to simulate realistic
driving conditions.
The proposed dialog system was run using no prior calibration, and optimization
occurred every time the decoder correctly recognized all ten digits in the phone
number. Utterances which occur prior to the first optimization exhibit the same
performance as the static MFNS system and are therefore ignored in the final
evaluation (N.B. this is why baseline results differ across the experiments).
In order to simulate a priori knowledge relating to previously optimized
enhancement parameters, the dialog-based framework was also tested using an
initial adaptation utterance which was either randomly chosen or from the idle
condition. The idle condition was chosen as this is a likely scenario for users to first
communicate with the in-car speech dialog system - for instance, for entering a
destination address before setting off on the journey. Again, all utterances which
occurred prior to the first subsequent optimization (excluding calibration) were
ignored in the evaluation.
Search WWH ::




Custom Search