Multimode Speech Coding - Digital Speech: Coding for Low Bit Rate Communication Systems

Digital Signal Processing Reference

In-Depth Information

Any waveform-coding technique can be used instead of ACELP. In fact

this hybrid model [27] does not restrict the choice of coding technique for

speech transitions, it merely makes the mode decision and defines the target

waveform. In white noise excited mode, the gain estimated from the LPC

residual energy is transmitted for every 20ms. The LPC parameters are

common for all the modes and estimated every 20ms (with a 25ms window

length), which are usually interpolated in the LSF domain for every subframe

in the synthesis process. In order to interpolate the LSFs, the LPC analysis

window is usually centred at the synthesis frame boundary which requires a

look-ahead.

A two-stage speech classification algorithm is used in the above coder. An

initial classification is made based on the tracked energy, low-band to high-

band energy ratio, and zero-crossing rate, and determines whether to use the

noise excitation or one of the other modes. The secondary classification, which

is based on an AbS process, makes a choice between the harmonic excitation

or ACELP. Segments of plosives with high-energy spikes are synthesized

using ACELP. When the noise excitation mode is selected, there is no need

to estimate the excitation parameters of the other modes. If noise excitation is

not selected, the harmonic parameters are always estimated and the harmonic

excitation is generated at the encoder for the AbS transition detection. The

speech classification is described in detail in Section 9.6.

For simplicity, details of LPC and adaptive codebook memory update are

excluded from the block diagram. The encoder maintains an LPC synthesis

filter synchronized with the decoder, and uses the final memory locations for

ACELP and AbS transition detection in the next frame. Adaptive codebook

memory is always updatedwith the previous LPC excitation vector regardless

of the mode. In order to maintain the LPC and the adaptive codebook

memories, the LPC excitation is generated at the encoder, regardless of

the mode.

9.5.1 SynchronizedHarmonicExcitation

In the harmonic mode, the pitch and harmonic amplitudes of the LPC residual

are estimated for every 20ms frame. The estimation windows are placed at

the end of the synthesis frames, and a look-ahead is used to facilitate the

harmonic parameter interpolation. The pitch estimation algorithm is based

on the sinusoidal speech-model matching proposed by McAulay [36] and

improved by Atkinson [4] and Villette [37, 38]. The initial pitch is refined to

0.2 sample accuracy using synthetic spectral matching proposed by Griffin

[3]. The harmonic amplitudes are estimated by simple peak-picking of the

magnitude spectrum of the LPC residual.

The harmonic excitation e h (n) is generated at the encoder for the AbS tran-

sition detection and to maintain the LPC and adaptive codebook memories,

Search WWH ::

Custom Search

Home