Digital Signal Processing Reference
In-Depth Information
Recognizing the undegraded set of test utterances with the trained HMM set
yielded a word error rate (WER) of 0.59%, which posed a lower bound to the
remaining recognition experiments.
7.6 Results
Our experimental results are summarized in Table 7.1 , which lists the obtainedWERs
in % for different disturbance conditions. In case (a), the echo signal was music,
whereas in case (b), the echo signal was speech. For reference, the lines labeled
“Muting” contain the results obtained with the baseline system. Since this system
was assumed to mute the car loudspeakers instantly upon receiving a PTS event, its
performance is independent of echo type and SER. Note that the baseline results must
be interpreted with care as they strongly depend on the timing of the PTS event relative
with the SOU. If, in practice, more speakers than the assumed 50% start speaking after
PTS actuation, better baseline performance will result. Nevertheless, an actual state-
of-the-art system may suffer from additional impediments not considered here: For
example, the muting of the loudspeakers will occur with additional delay; moreover,
the beep would not be omitted in practice.
The results in Table 7.1 show that the TAP system outperforms the reference
system under all test conditions. In the absence of noise SNR
, the TAP
system yields WERs of 0.73-2.29%, which is much closer to the limit of 0.59%
than the 4.20% WER obtained in the reference case. Moreover, the dependence on
the SER is negligible for SER
!1
, indicating that the AEC works reliably even
when there is noise. This seems to be a major advantage over the NLMS algorithm
when considering the results obtained in [ 3 ] andmight be attributed to the residual echo
< 1
Table 7.1 WER in % achieved with the TAP system under different SNR and SER conditions.
For comparison, the performance of a state-of-the-art system employing muting is included
SNR [dB]
5
0
5
10
15
20
1
Muting
73.41
37.90
14.93
7.17
5.02
4.54
4.20
(a) Echo signal is music
0
43.22
22.83
10.29
5.02
2.88
2.24
1.90
SER [dB]
5
42.83
22.83
10.44
4.83
2.98
2.34
1.95
10
42.73
22.49
10.59
4.88
2.88
2.29
1.95
1
43.85
24.63
11.71
6.10
3.27
2.68
0.73
(b) Echo signal is speech
0
43.02
22.39
10.63
5.32
3.17
2.39
2.29
SER [dB]
5
43.46
22.39
10.68
4.88
3.02
2.34
2.10
10
42.98
22.54
10.78
5.12
2.93
2.20
2.49
1
43.85
24.63
11.71
6.10
3.27
2.68
0.73
Search WWH ::




Custom Search