Digital Signal Processing Reference
In-Depth Information
different, especially between the major pitch pulses. The waveform similarity
is highest at the major excitation pulse locations and decreases along the pitch
cycles. This is due to the fact that SWPM models only the major pitch pulses
and it cannot model the minor pulses present in the residual signal when
the LPC residual energy is dispersed. Furthermore, the dispersed energy of
the LPC residual, becomes concentrated around the major pitch pulses in the
excitation signal. The synthesized speech also exhibits larger variations in
the amplitude around the pitch pulse locations, compared with the original
speech.
In order to understand the effects on subjective quality due to the above
observations, an informal listening test was conducted by switching between
the harmonically-synthesized speech and the original speech waveforms at
desired synthesis frame boundaries. The informal listening tests showed
occasional audible artifacts at the mode transitions, when switching from
the harmonic mode to the waveform-coding mode. However there were
no audible switching artifacts when switching from waveform-coding to
harmonic-coding mode, i.e. at the onsets. It was found that this is due to
two reasons: difficulties in reliable pitch pulse detection and limitations
in representing the harmonic phases using the pitch pulse shape at some
segments. At some highly resonant segments, the LPC residual looks like
random noise and it is not possible even to define the pitch pulses. The
predicted pitch pulse location, assuming a continuing pitch contour, may be
incorrect at resonant tails. At such segments, the pitch pulse locations are
determined by applying AbS techniques in the speech domain, such that the
synthesized speech signal is synchronized with the original, as described in
the next subsection. In the speech segments illustrated using Figure 9.13c,
it is possible to detect dominant pitch pulses. However the LPC residual
energy is dispersed throughout the pitch periods, making the pitch pulses
less significant, as described in Section 9.4.1. This effect reduces the coherence
of the LPC residual harmonic phases at the pulse locations and the DFT phase
spectrum estimated at the pulse locations look random. Female vowels with
short pitch periods show these characteristics. A dispersed phase spectrum
reduces the effectiveness of the pitch pulse shape, since the concept of pitch
pulse shape is based on the assumption that a pitch pulse is the result of
the superimposition of coherent phases, which have the same value at the
pitch pulse location. This effect is illustrated in Figure 9.14. The synthesized
pitch pulse models the major pulse in the LPC residual pitch period and
concentrates the energy at the pulse location. This is due to the single phase
value used to synthesize the pulse, as opposed to the more random-looking
phase spectrum of the original pitch cycle. This phenomenon introduces
phase discontinuities, which accounts for the audible switching artifacts.
However the click and pop sounds present at the mode transitions in speech
synthesized with SWPM are less annoying than those in a conventional
Search WWH ::




Custom Search