Digital Signal Processing Reference
In-Depth Information
={ k = 1 (t)
where (t)
1 /K , as defined in equation (11.41). The threshold θ
is set to a sufficiently small value, i.e. θ< 1, that rarely classifies the speech
as silence.
}
k
11.3.4 Comparisons
In order to show the robustness of the noise adaptation techniques, speech
quality is compared in terms of both SEGSNR improvement and ISD with
respect to the speech-detection error-rate of VAD ( E d ). Various E d are cali-
brated by a voice activity detector [21], and then frame-by-frame VAD results
are given to each noise adaptation method. For the experiment, speech
material of 64 seconds was mixed with vehicle noise at 5 dB SNR, and then
processed every 10ms in the frequency domain by the MMSE estimator [10]
employing noise adaptation methods. Finally, the enhanced speech signal
is obtained by the inverse DFT of the enhanced spectrum, followed by the
overlap-and-add procedure.
SEGSNR improvement and ISD between the clean and enhanced speech
signals for vehicle noisy speech signals of 0, 5, and 10 dB SNR are shown in
Figures 11.15, 11.16, and 11.17, respectively. The experiments confirm that
The SD-based method results in worse performance compared with both
the MD- and the HD-based methods, for low E d .
The HD-based method exhibits significant degradation in performance
with increases in E d .
The MD-based method produces, regardless of the VAD performance,
robust and superior performance in comparison with the HD- and SD-
based methods.
Note that for very low E d ,i.e.0 . 0
E d < 0 . 1, the performances of the MD
and HD are slightly worse than in the case of E d =
0 . 2. This is caused by less
frequent adaptation of the noise frames because of the increased false alarm
rate of the VAD. n other words, VAD produces the low E d at the expense of
an increased false alarm rate during pauses.
Results for helicopter noisy speech with levels of 0, 5, and 10 dB SNR are
shown in Figures 11.18, 11.19, and 11.20, respectively. They exhibit perfor-
mance patterns similar to the vehicle noisy signals despite differences in the
absolute values being measured.
In conclusion we can say that the STSA-based spectral enhancement tech-
niques including GSS, GBSS, ML, WF, and MMSE-based algorithms together
with the estimate of speech presence uncertainty have various advantages
and disadvantages. The MMSE-based STSA method combined with speech
presence uncertainty is perhaps the best currently available method for
Search WWH ::




Custom Search