Digital Signal Processing Reference
In-Depth Information
suppression of the postfilter. However, the TAP system exhibits decreased perfor-
mance when there is background noise but no echo signal (SNR
);
thismay indicate that in the absence of LEMexcitation, the operation of the postfilter is
suboptimal.
When judging the SNR dependence of the TAP system, note the following: Since
the test speech files were close-talk recordings made in a vehicle environment, they are
not entirely clean with respect to background noise. As a consequence, the SNR values
shown in Table 7.1 are biased towards higher values as they only reflect the amount of
noise added artificially.
< 1
,SER
!1
7.7 Conclusion
We have investigated the performance of a so-called TAP system, which tolerates
imperfect user behavior when initiating a speech dialog. As in [ 3 ], we have
demonstrated that the TAP system significantly improves recognition performance
assuming that half of the users actuate the push-to-speak button shortly after they
start speaking. This is achieved by means of two synchronized circular buffers
providing a look-back capability and a robust speech onset detection. We have
included an AEC and noise reduction unit operating in the frequency domain to
eliminate loudspeaker signal as well as background noise leaking into the micro-
phone. Further investigations will include AEC for multichannel source signals as
well as improved methods to measure the SNR and SER. In addition, more complex
ASR tasks will be evaluated using the TAP system.
References
1. Shozakai M, Nakamura S, Shikano K (1998) Robust speech recognition in car environments.
In: Proceedings of ICASSP'98, Seattle, pp 269-272
2. Matassoni M, Omologo M, Zieger C (2003). Experiments of in-car audio compensation for
hands-free speech recognition. In: 2003 IEEE workshop on automatic speech recognition and
understanding. pp 369-374
3. Fodor B, Scheler D, Suhadi S, Fingscheidt T (2009) Talk-and-Push (TAP) - towards more
natural speech dialog initiation. In: AES 36th international conference, Dearborn
4. Enzner G, Vary P (2006) Frequency-domain adaptive Kalman filter for acoustic echo control in
hands-free telephones. Signal Process 86(6):1140-1156, Elsevier
5. Scalart P, Filho J (1996) Speech enhancement based on a priori signal to noise estimation.
In: Proceedings of ICASSP 1996, Atlanta, pp 629-632
6. Ephraim Y, Malah D (1984) Speech enhancement using a inimum Mean-square Error Short-time
Spectral Amplitude Estimator. IEEE Trans Acoustics Speech Signal Process 32(6):1109-1121
7. Moreno A, Lindberg B, Draxler C, Richard G, Choukri K, Euler S, Allen J (2000) SpeechDat-
Car: a large database for automotive environments. In: Proceedings of LREC 2000, Athens
8. International Telecommunication Union (1993) ITU-T recommendation P.56
Search WWH ::




Custom Search