Multimode Speech Coding - Digital Speech: Coding for Low Bit Rate Communication Systems

Digital Signal Processing Reference

In-Depth Information

model parameters of different modes may be quantized at different bit-rates,

allocating the minimum number of bits required for each mode to maintain

adequate quality.

In the example, here the LPC parameters are common for all the modes,

and quantized using a fixed number of bits per frame. This is advantageous

under noisy channel conditions, since the LPC parameters can be decoded

correctly even when the mode bits are in error. The LPC parameters are

quantized in the LSF domain using a multi-stage vector quantifier (MSVQ),

with a first order moving average (MA) prediction [37]. Having quantized

the LSFs, the excitation of the three modes are quantized differently.

9.9.2 UnvoicedExcitationQuantization

The hybrid coding algorithm synthesizes unvoiced speech using scaled white

Gaussian noise as the LPC excitation. Therefore, only a gain term is required

in addition to the LPC parameters to synthesize unvoiced speech. In order

to synthesize the unvoiced plosives with adequate quality, the gain term

should be updated at least every 5ms. However listening tests show that

synthesizing plosives using ACELP gives better perceptual quality. Therefore

the plosives are synthesized using ACELP. The energy of the fricatives does

not show rapid fluctuations and updating at the frame rate of every 20ms is

adequate to synthesize high-quality unvoiced fricatives.

The unvoiced gain g uv is quantized using a logarithmic scalar quantizer.

The quantized unvoiced gain g uv i

is given by,

k g max

i

+

k

N

−

1

g uv i =

−

k

for i

=

0 , 1 , 2 , ... ,N

−

1

(9.56)

k

where N is the number of quantizer levels, g max , defines the upper limit of g uv i ,

and k is a constant which controls the gradient of the exponential function.

All the g uv values larger than g max are clipped at g max . The constant k is set as

16 and 32 quantizer levels were sufficient to produce high quality unvoiced

speech. Hence five bits are required to transmit the quantized unvoiced gain,

g uv i . Figure 9.27 depicts a typical plot of the unvoiced gain quantizer levels

where the maximum g max =

904.

9.9.3 HarmonicExcitationQuantization

The stationary voiced speech segments are synthesized using the synchro-

nized harmonic excitation model described earlier. The model parameters

of the harmonic excitation with SWPM are pitch period, pitch pulse loca-

tion (PPL), pitch pulse shape (PPS), harmonic amplitudes, and gain. The

AbS transition detection algorithm synthesizes the harmonic excitation using

SWPM at the encoder to evaluate the suitability of the harmonic mode.

Digital Speech: Coding for Low Bit Rate Communication Systems

Search WWH ::

Custom Search

Home