Digital Signal Processing Reference
In-Depth Information
original (with increasing bit rate) or not. It is therefore more appropriate to
group speech coders into the above two groups as the old waveform coding
terminology is no longer applicable. If required we can associate the name
hybrid coding with coding types that may use more than one speech coding
principle, which is switched in and out according to the input speech signal
characteristics. For example, a waveform approximating coder, such as CELP,
may combine in an advantageous way with a harmonic coder, which uses a
parametric coding method, to form such a hybrid coder.
2.2.1 ParametricCoders
Parametric coders model the speech signal using a set of model parameters.
The extracted parameters at the encoder are quantized and transmitted to the
decoder. The decoder synthesizes speech according to the specified model.
The speech production model does not account for the quantization noise
or try to preserve the waveform similarity between the synthesized and the
original speech signals. The model parameter estimation may be an open loop
process with no feedback from the quantization or the speech synthesis. These
coders only preserve the features included in the speech production model,
e.g. spectral envelope, pitch and energy contour, etc. The speech quality of
parametric coders do not converge towards the transparent quality of the
original speech with better quantization of model parameters, see Figure 2.1.
This is due to limitations of the speech production model used. Furthermore,
they do not preserve the waveform similarity and the measurement of signal
to noise ratio (SNR) is meaningless, as often the SNR becomes negative when
expressed in dB (as the input and output waveforms may not have phase
alignment). The SNR has no correlation with the synthesized speech quality
and the quality should be assessed subjectively (or perceptually).
Linear Prediction Based Vocoders
Linear Prediction (LP) based vocoders are designed to emulate the human
speech production mechanism [2]. The vocal tract is modelled by a linear
prediction filter. The glottal pulses and turbulent air flow at the glottis are
modelled by periodic pulses and Gaussian noise respectively, which form
the excitation signal of the linear prediction filter. The LP filter coefficients,
signal power, binary voicing decision (i.e. periodic pulses or noise excitation),
and pitch period of the voiced segments are estimated for transmission
to the decoder. The main weakness of LP based vocoders is the binary
voicing decision of the excitation, which fails to model mixed signal types
with both periodic and noisy components. By employing frequency domain
voicing decision techniques, the performance of LP based vocoders can be
improved [3].
Search WWH ::




Custom Search