A Novel Way to Start Speech Dialogs in Cars by Talk-and-Push (TAP) - Digital Signal Processing for In-Vehicle Systems and Safety

Digital Signal Processing Reference

In-Depth Information

Recognizing the undegraded set of test utterances with the trained HMM set

yielded a word error rate (WER) of 0.59%, which posed a lower bound to the

remaining recognition experiments.

7.6 Results

Our experimental results are summarized in Table 7.1 , which lists the obtainedWERs

in % for different disturbance conditions. In case (a), the echo signal was music,

whereas in case (b), the echo signal was speech. For reference, the lines labeled

“Muting” contain the results obtained with the baseline system. Since this system

was assumed to mute the car loudspeakers instantly upon receiving a PTS event, its

performance is independent of echo type and SER. Note that the baseline results must

be interpreted with care as they strongly depend on the timing of the PTS event relative

with the SOU. If, in practice, more speakers than the assumed 50% start speaking after

PTS actuation, better baseline performance will result. Nevertheless, an actual state-

of-the-art system may suffer from additional impediments not considered here: For

example, the muting of the loudspeakers will occur with additional delay; moreover,

the beep would not be omitted in practice.

The results in Table 7.1 show that the TAP system outperforms the reference

system under all test conditions. In the absence of noise SNR

, the TAP

system yields WERs of 0.73-2.29%, which is much closer to the limit of 0.59%

than the 4.20% WER obtained in the reference case. Moreover, the dependence on

the SER is negligible for SER

!1

, indicating that the AEC works reliably even

when there is noise. This seems to be a major advantage over the NLMS algorithm

when considering the results obtained in [ 3 ] andmight be attributed to the residual echo

< 1

Table 7.1 WER in % achieved with the TAP system under different SNR and SER conditions.

For comparison, the performance of a state-of-the-art system employing muting is included

SNR [dB]

5

0

5

10

15

20

1

Muting

73.41

37.90

14.93

7.17

5.02

4.54

4.20

(a) Echo signal is music

0

43.22

22.83

10.29

5.02

2.88

2.24

1.90

SER [dB]

5

42.83

22.83

10.44

4.83

2.98

2.34

1.95

10

42.73

22.49

10.59

4.88

2.88

2.29

1.95

1

43.85

24.63

11.71

6.10

3.27

2.68

0.73

(b) Echo signal is speech

0

43.02

22.39

10.63

5.32

3.17

2.39

2.29

SER [dB]

5

43.46

22.39

10.68

4.88

3.02

2.34

2.10

10

42.98

22.54

10.78

5.12

2.93

2.20

2.49

1

43.85

24.63

11.71

6.10

3.27

2.68

0.73

Digital Signal Processing for In-Vehicle Systems and Safety

Search WWH ::

Custom Search

Home