Multimode Speech Coding - Digital Speech: Coding for Low Bit Rate Communication Systems

Digital Signal Processing Reference

In-Depth Information

and parameter interpolation. Interpolation requires the parameters of the

previous frame; when switched from a different mode, those parameters may

not be directly available. Predictive quantization schemes also require the

previous memory. Techniques which eliminate these initialization/memory

problems are required.

9.2.1 ReliableSpeechClassification

A voice activity detector (VAD) can be used to identify speech and silence

segments [9], while classification of speech into voiced and unvoiced segments

can be seen as the most basic speech classification technique. However, there

are coders in the literature which use up to six phonetic classes [10]. The

design of such a phonetic classification algorithm can be complicated and

computationally complex, and a simple classification with two or three

modes is sufficient to exploit the relative merits of waveform and harmonic

coding methods. The accuracy of the speech classification is critical for the

performance of a hybrid coder. For example, using noise excitation for a

stationary voiced segment (which should operate in harmonic coding mode)

can severely degrade the speech quality, by converting the high-voiced

energy of the original speech into noise in the synthesized speech; use of

harmonic excitation for unvoiced segments gives a tonal artifact. ACELP can

generally maintain acceptable quality for all the types of speech since it has

waveform-matching capability. During the speech classification process, it is

essential that the above cases are taken into account to generate a fail-safe

mode selection.

9.2.2 PhaseSynchronization

Harmonic coders operating at 4 kb/s and below do not transmit phase

information, in order to allocate the available bits for accurate quantization

of the more important spectral magnitude information. They exploit the fact

that the human ear is partially phase-insensitive and the waveform shape

of the synthesized speech can be very different from the original speech,

often yielding negative SNRs. On the other hand, AbS coders preserve the

waveform similarity. Direct switching between those two modes without

any precautions will severely degrade the speech quality due to phase

discontinuities.

9.3 Summary of Hybrid Coders

The hybrid coding concept has been introduced in the LPC vocoder [11],

which classifies speech frames into voiced or unvoiced, and synthesizes the

excitation using periodic pulses or white noise, respectively. Analysis-by-

synthesis CELP coders with dynamic bit allocation (DBA), which adaptively

Search WWH ::

Custom Search

Home