Digital Signal Processing Reference
In-Depth Information
Any waveform-coding technique can be used instead of ACELP. In fact
this hybrid model [27] does not restrict the choice of coding technique for
speech transitions, it merely makes the mode decision and defines the target
waveform. In white noise excited mode, the gain estimated from the LPC
residual energy is transmitted for every 20ms. The LPC parameters are
common for all the modes and estimated every 20ms (with a 25ms window
length), which are usually interpolated in the LSF domain for every subframe
in the synthesis process. In order to interpolate the LSFs, the LPC analysis
window is usually centred at the synthesis frame boundary which requires a
look-ahead.
A two-stage speech classification algorithm is used in the above coder. An
initial classification is made based on the tracked energy, low-band to high-
band energy ratio, and zero-crossing rate, and determines whether to use the
noise excitation or one of the other modes. The secondary classification, which
is based on an AbS process, makes a choice between the harmonic excitation
or ACELP. Segments of plosives with high-energy spikes are synthesized
using ACELP. When the noise excitation mode is selected, there is no need
to estimate the excitation parameters of the other modes. If noise excitation is
not selected, the harmonic parameters are always estimated and the harmonic
excitation is generated at the encoder for the AbS transition detection. The
speech classification is described in detail in Section 9.6.
For simplicity, details of LPC and adaptive codebook memory update are
excluded from the block diagram. The encoder maintains an LPC synthesis
filter synchronized with the decoder, and uses the final memory locations for
ACELP and AbS transition detection in the next frame. Adaptive codebook
memory is always updatedwith the previous LPC excitation vector regardless
of the mode. In order to maintain the LPC and the adaptive codebook
memories, the LPC excitation is generated at the encoder, regardless of
the mode.
9.5.1 SynchronizedHarmonicExcitation
In the harmonic mode, the pitch and harmonic amplitudes of the LPC residual
are estimated for every 20ms frame. The estimation windows are placed at
the end of the synthesis frames, and a look-ahead is used to facilitate the
harmonic parameter interpolation. The pitch estimation algorithm is based
on the sinusoidal speech-model matching proposed by McAulay [36] and
improved by Atkinson [4] and Villette [37, 38]. The initial pitch is refined to
0.2 sample accuracy using synthetic spectral matching proposed by Griffin
[3]. The harmonic amplitudes are estimated by simple peak-picking of the
magnitude spectrum of the LPC residual.
The harmonic excitation e h (n) is generated at the encoder for the AbS tran-
sition detection and to maintain the LPC and adaptive codebook memories,
Search WWH ::




Custom Search