Digital Signal Processing Reference
In-Depth Information
Harmonic Coders
Harmonic or sinusoidal coding represents the speech signal as a sum of sinu-
soidal components. The model parameters, i.e. the amplitudes, frequencies
and phases of sinusoids, are estimated at regular intervals from the speech
spectrum. The frequency tracks are extracted from the peaks of the speech
spectra, and the amplitudes and frequencies are interpolated in the synthesis
process for smooth evolution [4]. The general sinusoidal model does not
restrict the frequency tracks to be harmonics of the fundamental frequency.
Increasing the parameter extraction rate converges the synthesized speech
waveform towards the original, if the parameters are unquantized. However
at low bit rates the phases are not transmitted and estimated at the decoder,
and the frequency tracks are confined to be harmonics. Therefore point to
point waveform similarity is not preserved.
2.2.2 Waveform-approximatingCoders
Waveform coders minimize the error between the synthesized and the origi-
nal speech waveforms. The early waveform coders such as companded Pulse
Code Modulation (PCM) [5] and Adaptive Differential Pulse Code Mod-
ulation (ADPCM) [6] transmit a quantized value for each speech sample.
However ADPCM employs an adaptive pole zero predictor and quantizes
the error signal, with an adaptive quantizer step size. ADPCM predictor
coefficients and the quantizer step size are backward adaptive and updated
at the sampling rate.
The recent waveform-approximating coders based on time domain analysis
by synthesis such as Code Excited Linear Prediction (CELP) [7], explicitly
make use of the vocal tract model and the long term prediction to model
the correlations present in the speech signal. CELP coders buffer the speech
signal and perform block based analysis and transmit the prediction filter
coefficients along with an index for the excitation vector. They also employ
perceptual weighting so that the quantization noise spectrum is masked by
the signal level.
2.2.3 HybridCodingofSpeech
Almost all of the existing speech coders apply the same coding principle,
regardless of the widely varying character of the speech signal, i.e. voiced,
unvoiced, mixed, transitions etc. Examples include Adaptive Differential
Pulse CodeModulation (ADPCM) [6], Code Excited Linear Prediction (CELP)
[7, 8], and Improved Multi Band Excitation (IMBE) [9, 10]. When the bit rate
is reduced, the perceived quality of these coders tends to degrade more
for some speech segments while remaining adequate for others. This shows
that the assumed coding principle is not adequate for all speech types.
In order to circumvent this problem, hybrid coders that combine different
Search WWH ::




Custom Search