Digital Signal Processing Reference
In-Depth Information
and parameter interpolation. Interpolation requires the parameters of the
previous frame; when switched from a different mode, those parameters may
not be directly available. Predictive quantization schemes also require the
previous memory. Techniques which eliminate these initialization/memory
problems are required.
9.2.1 ReliableSpeechClassification
A voice activity detector (VAD) can be used to identify speech and silence
segments [9], while classification of speech into voiced and unvoiced segments
can be seen as the most basic speech classification technique. However, there
are coders in the literature which use up to six phonetic classes [10]. The
design of such a phonetic classification algorithm can be complicated and
computationally complex, and a simple classification with two or three
modes is sufficient to exploit the relative merits of waveform and harmonic
coding methods. The accuracy of the speech classification is critical for the
performance of a hybrid coder. For example, using noise excitation for a
stationary voiced segment (which should operate in harmonic coding mode)
can severely degrade the speech quality, by converting the high-voiced
energy of the original speech into noise in the synthesized speech; use of
harmonic excitation for unvoiced segments gives a tonal artifact. ACELP can
generally maintain acceptable quality for all the types of speech since it has
waveform-matching capability. During the speech classification process, it is
essential that the above cases are taken into account to generate a fail-safe
mode selection.
9.2.2 PhaseSynchronization
Harmonic coders operating at 4 kb/s and below do not transmit phase
information, in order to allocate the available bits for accurate quantization
of the more important spectral magnitude information. They exploit the fact
that the human ear is partially phase-insensitive and the waveform shape
of the synthesized speech can be very different from the original speech,
often yielding negative SNRs. On the other hand, AbS coders preserve the
waveform similarity. Direct switching between those two modes without
any precautions will severely degrade the speech quality due to phase
discontinuities.
9.3 Summary of Hybrid Coders
The hybrid coding concept has been introduced in the LPC vocoder [11],
which classifies speech frames into voiced or unvoiced, and synthesizes the
excitation using periodic pulses or white noise, respectively. Analysis-by-
synthesis CELP coders with dynamic bit allocation (DBA), which adaptively
Search WWH ::




Custom Search