Digital Signal Processing Reference
In-Depth Information
Table 10.4 ASR results for all LIMA frameworks
Framework
IDL
35U
35D
55U
55D
Baseline
79.1
55.8
42.1
49.8
27.6
a
( k )
¼
1
81.8
53.9
41.6
51.7
30.1
Proposed dialog system
82.6
55.9
42.3
53.1
31.1
Baseline
80.7
55.5
43.3
49.5
28.6
a
( k )
¼
1
81.4
53.3
45.3
50.0
33.6
Calibrated system (random)
82.5
55.7
46.4
52.5
33.3
Proposed dialog (random)
82.3
57.7
45.5
52.7
32.3
Baseline
80.4
57.7
44.7
53.3
28.4
a ( k ) ¼ 1
82.2
52.5
42.9
53.9
30.3
Calibrated system (IDL)
82.4
55.4
44.6
54.9
31.0
Proposed dialog (IDL)
82.9
55.9
46.0
55.5
30.9
system without enhancement being 16.7% in the idle condition. This particular
result demonstrates the true potential of the framework to improve ASR accuracy,
since utterances spoken during idle are most likely to trigger the optimization
process. In comparison to the baseline enhancement system, the proposed frame-
work shows relative improvements of between 1.2% and 4.4% in this mode of
operation.
There are also noticeable improvements of the calibration-only LIMA frame-
work, particularly one performing calibration during idle. In this case, the relative
improvements range from 1.2% to 2.8% (excluding the marginal decrease in
performance in the 55D noise condition). Given that most users will first speak to
in-car dialog systems when entering their vehicle, this result verifies the potential of
the proposed framework to be incorporated with a calibration session to produce
further improvements in system performance.
Considering the operation of the proposed dialog-based system, there is potential
for a loss of generality if a particular noise condition is consecutively optimized
(as per the results in Table 10.2 ). The consistent improvements in Table 10.4 ,
however, indicate that this is not an issue as regular changes in noise conditions
seem to allow the optimization process to effectively track the internal noise
conditions and set the enhancement parameters appropriately.
10.5 Conclusions
This chapter has reviewed likelihood-maximizing frameworks using Mel-filterbank
noise subtraction for in-car speech recognition. A new LIMA framework based on a
user-confirmation speech dialog system has been proposed. This framework has been
evaluated against calibrated LIMA frameworks utilizing different adaptation scenarios.
Experiments have shown that with the proposed LIMA framework, minimal
optimization is required for the best average recognition performance in car
environments. This permits pseudo real-time operation of LIMA frameworks whilst
Search WWH ::




Custom Search