Multimode Speech Coding - Digital Speech: Coding for Low Bit Rate Communication Systems

Digital Signal Processing Reference

In-Depth Information

of a sample can introduce audible high frequency distortion, especially in

segments with short pitch periods. Consequently, the displacements should

be performedwith a high resolution. TheMELP/CELP coder preserves signal

continuity by transmitting an alignment phase for MELP-encoded frames and

using zero phase equalization for transitional frames. Zero phase equalization

may reduce the benefits of AbS coding by modifying the phase spectrum,

and it has been reported that the phase spectrum is perceptually important

[23-25]. Furthermore, zero phase equalization relies on accurate pitch pulse

position detection at the transitions, which can be difficult.

Harmonic excitation can be synchronized with the LPC residual by trans-

mitting the phases, which eliminates the above difficulties. However this

requires a prohibitive capacity making it unsuitable for low bit-rate appli-

cations. As a compromise, Katugampala [26] proposed a new phase model

for the harmonic excitation called synchronized waveform-matched phase

model (SWPM). SWPMfacilitates the integration of harmonic andAbS coders,

by synchronizing the harmonic excitation with the LPC residual. SWPM

requires only two parameters and does not alter the perceptual quality of the

harmonically-synthesized speech. It also allows the ACELP mode to target

the speech waveform without modifying the perceptually-important phase

components or the frame boundaries.

9.4 Synchronized Waveform-Matched Phase Model

The SWPM maintains the time-synchrony between the original and the

harmonically-synthesized speech by transmitting the pitch pulse loca-

tion (PPL) closest to each synthesis frame boundary [27, 28, 26]. The SWPM

also preserves sufficient waveform similarity, such that switching between

the coding modes is transparent, by transmitting a phase value that indicates

the pitch pulse shape (PPS) of the corresponding pitch pulse. PPL and PPS are

estimated in every frame of 20ms. SWPMneeds to detect the pitch pulses only

in the stationary voiced segments, which is somewhat easier than detecting

the pitch pulses in the transitions as in [18]. The SWPM has the disadvantage

of transmitting two extra parameters (PPL and PPS) but the bottleneck of the

bit allocation of hybrid coders is usually in the waveform-coding mode. Fur-

thermore, in stationary voiced segments the location of the pitch pulses can

be predicted with high accuracy, and only an error needs to be transmitted.

The same argument applies to the shape of the pitch pulses.

In the harmonic synthesis, cubic phase interpolation [2] is applied between

the pitch pulse locations, setting the phases of all the harmonics equal

to PPS. This makes the waveform similarity between the original and the

synthesized speech highest in the vicinity of the selected pitch pulse locations.

However this does not cause difficulties, since switching is restricted to frame

boundaries and the pitch pulse locations closest to the frame boundaries

Search WWH ::

Custom Search

Home