Digital Signal Processing Reference
In-Depth Information
the waveform-coding target, which follows the harmonic mode. The remain-
ing phase discontinuities can be corrected within the first waveform-coding
frame, since SWPM keeps the phase discontinuities at a minimum and the
pitch periods are synchronized.
As a first approach the harmonic excitation is extended into the next frame
and the synthesized speech is linearly interpolated with the original speech
at the beginning of the frame in order to produce the waveform-coding
target. Listening tests were carried out with different interpolation lengths.
The waveform-coding target was not quantized, in order to isolate the
distortions due to switching. The tests were extended in order to understand
the audibility of the phase discontinuitieswith the frequency of the harmonics,
by manually shifting one phase at a time and synthesizing the rest of
the harmonics using the original phases. Phase shifts of π/ 2and π are
used. Listening tests show that for various interpolation lengths the phase
discontinuities below 1 kHz are audible, and an interpolation length as small
as 10 samples is sufficient to mask distortions in the higher frequencies.
Furthermore, male speech segments with long pitch periods, around 80
samples and above, do not cause audible switching artifacts. Male speech
segments with long pitch periods have well-resolved short-term and long-
term correlations, and produce clear and sharp pitch pulses, which can be
easily modeled by SWPM. Therefore only the harmonics below 1 kHz of the
segments with pitch periods shorter than 80 samples are considered in the
offset target modification process.
The harmonic excitation is extended beyond the mode transition frame
boundary, and the synthesized speech is generated in order to estimate the
harmonic phases at the mode transition frame boundary. The phase of the k th
harmonic of the excitation is computed as follows:
θ k i + 1 (n)
=
θ k i +
2 πkn/τ i
for 0
n < N
(9.36)
where θ k i is the phase of the k th harmonic and τ i is the pitch at the end of
synthesis frame i . The excitation signal is given by,
K
a k i cos θ k i + 1 (n)
=
e h i + 1 (n)
(9.37)
k
=
1
where K is the number of harmonics and a k i is the amplitude of the k th
harmonic estimated at the end of the synthesis frame i . The excitation signal
is filtered through the LPC synthesis filter to produce the synthesized speech
signal, with the coefficients estimated at the end of the synthesis frame i .
The LPC memories after synthesizing the i th frame are used as the initial
memories. The speech samples synthesized for the i th and i
1 th frames are
+
Search WWH ::




Custom Search