Digital Signal Processing Reference
In-Depth Information
of a sample can introduce audible high frequency distortion, especially in
segments with short pitch periods. Consequently, the displacements should
be performedwith a high resolution. TheMELP/CELP coder preserves signal
continuity by transmitting an alignment phase for MELP-encoded frames and
using zero phase equalization for transitional frames. Zero phase equalization
may reduce the benefits of AbS coding by modifying the phase spectrum,
and it has been reported that the phase spectrum is perceptually important
[23-25]. Furthermore, zero phase equalization relies on accurate pitch pulse
position detection at the transitions, which can be difficult.
Harmonic excitation can be synchronized with the LPC residual by trans-
mitting the phases, which eliminates the above difficulties. However this
requires a prohibitive capacity making it unsuitable for low bit-rate appli-
cations. As a compromise, Katugampala [26] proposed a new phase model
for the harmonic excitation called synchronized waveform-matched phase
model (SWPM). SWPMfacilitates the integration of harmonic andAbS coders,
by synchronizing the harmonic excitation with the LPC residual. SWPM
requires only two parameters and does not alter the perceptual quality of the
harmonically-synthesized speech. It also allows the ACELP mode to target
the speech waveform without modifying the perceptually-important phase
components or the frame boundaries.
9.4 Synchronized Waveform-Matched Phase Model
The SWPM maintains the time-synchrony between the original and the
harmonically-synthesized speech by transmitting the pitch pulse loca-
tion (PPL) closest to each synthesis frame boundary [27, 28, 26]. The SWPM
also preserves sufficient waveform similarity, such that switching between
the coding modes is transparent, by transmitting a phase value that indicates
the pitch pulse shape (PPS) of the corresponding pitch pulse. PPL and PPS are
estimated in every frame of 20ms. SWPMneeds to detect the pitch pulses only
in the stationary voiced segments, which is somewhat easier than detecting
the pitch pulses in the transitions as in [18]. The SWPM has the disadvantage
of transmitting two extra parameters (PPL and PPS) but the bottleneck of the
bit allocation of hybrid coders is usually in the waveform-coding mode. Fur-
thermore, in stationary voiced segments the location of the pitch pulses can
be predicted with high accuracy, and only an error needs to be transmitted.
The same argument applies to the shape of the pitch pulses.
In the harmonic synthesis, cubic phase interpolation [2] is applied between
the pitch pulse locations, setting the phases of all the harmonics equal
to PPS. This makes the waveform similarity between the original and the
synthesized speech highest in the vicinity of the selected pitch pulse locations.
However this does not cause difficulties, since switching is restricted to frame
boundaries and the pitch pulse locations closest to the frame boundaries
Search WWH ::




Custom Search