Digital Signal Processing Reference
In-Depth Information
60
60
silence
onset
offset
SAS
silence
onset
offset
SAS
50
50
40
40
30
30
20
20
10
10
0
0
0
.1
.2 .3 .4 .5 .6 .7 .8 .9 .95 .98
Smoothing factor of SLR ( k )
0
.1
.2 .3 .4 .5 .6 .7 .8 .9 .95 .98
Smoothing factor of SLR ( k )
(a)
(b)
Figure 10.16 Analysis of the smoothing factor κ of the SLR with respect to detection
error rates; the noise level is 10 dB SNR and the noise sources are (a) vehicle and
(b) babble; SAS indicates speech active sections
at the onset regions for both vehicle and babble noisy signals. In the case
of vehicle noisy signals, as κ increases, the false alarm rate in the inactive
frames increases gradually for κ< 0 . 9, and then substantially for κ> 0 . 9.
However, in the case of babble noisy signals, it can be seen that the error rate
decreases gradually as κ increases for κ< 0 . 9, and then increases like the case
of the vehicle noisy signal, for κ> 0 . 9. Therefore, if κ is selected properly,
SLR-based method can give significantly improved performances over the
LR-based method.
Under various noise levels and sources, the performance of VAD methods
such as SLR-based VAD [16, 17], ITU-T G.729 annex B VAD (G.729B) [1], ETSI
AMR VAD option 2 (AMR2) [8], and LR-based VAD with and without the
hangover scheme [12] have been compared as shown in Table 10.2. Original
AMR2 produces the detection result every 20ms by the logical OR operation
of two 10ms detection results, thus the 10ms result can be obtained easily
by slight modification of the original code. Taking into account the results
in Figure 10.16, κ
0 . 9 is selected for SLR-based VAD. G.729B generates
considerably high error rates at the active regions in comparison with other
methods. It is important to note that frequent detection errors of speech
frames lead to serious degradation in speech quality, thus the error rate of
speech frame detection should be as low as possible. LR-based VAD gives
consistently superior performance to G.729B, but VAD without the hangover
scheme produces relatively high detection error rates in the active regions.
The hangover scheme can considerably alleviate this problem, but the speech
detection error rate is still somewhat high in comparison with the results of
both SLR-based VAD and AMR2. The performance of SLR-based VAD and
AMR2 seems to be comparable.
=
 
Search WWH ::




Custom Search