Digital Signal Processing Reference
In-Depth Information
and, when mode-switching, proper coder synchronization. Provided that
these two processing stages are carried out successfully, the quality of
speech produced by a hybrid coding method is of good to toll quality at
around 3.5-5 kb/s (average). Simple informal subjective listening test results
confirm that the hybrid model eliminates the limitations of the existing
single-model-based coders.
The robustness of the hybrid coding algorithm under acoustic noise and
channel error conditions is another important issue which requires significant
research effort. The difficulties specific to hybrid coders are the speech
classification under background noise, and themode-bit errors due to random
channel errors. Although the classification algorithm is capable of selecting
the best mode under noisy background conditions, there is a significant bias
towards ACELP in the presence of noise compared to clean speech conditions.
This is due to the inability of the white-noise excitation or the harmonic
excitation to encode the corrupted signals. The noisy speech synthesized using
the harmonic mode sounds metallic, which can be improved by introducing
a proper voicing mixture classification when harmonic mode is selected.
The robustness of the hybrid coder to mode errors has been tested by
simulating all the possible mode errors. The coder is capable of isolating
the mode errors and return to normal decoding almost immediately. This is
mainly due to the independent memory reinitialization of the modes when
switched from a different mode.
Finally it is important that each element or codingmode of the hybridmodel
is redesigned with the knowledge that the noise, ACELP and harmonic
excitation models will be used during noise (or silence), transitions, and
steady state voiced speech parts respectively. In this case the LPC parameters
of ACELP and harmonic modes will have different vector quantizer tables
which will be trained over transitional and steady state voiced speech only
respectively, thus improving the quantization performance. In addition, using
theLTPinACELPmodeattheonsetsmaynotbenecessary.Insteadmore
pulses with phase spreading may be used to improve quality.
Bibliography
[1] R. J. McAulay and T. F. Quatieri (1995) 'Sinusoidal coding', in Speech
coding and synthesis by W. B. Kleijn and K. K. Paliwal (Eds), pp. 121-74.
Amsterdam: Elsevier Science
[2] R. J. McAulay and T. F. Quatieri (1986) 'Speech analysis/synthesis based
on a sinusoidal representation', in IEEE Trans. on Acoust., Speech and
Signal Processing , 34(4):744-54.
[3] D. Griffin and J. S. Lim (1988) 'Multiband excitation vocoder', in IEEE
Trans. on Acoust., Speech and Signal Processing , 36(8):1223-35.
Search WWH ::




Custom Search