Multimode Speech Coding - Digital Speech: Coding for Low Bit Rate Communication Systems

Digital Signal Processing Reference

In-Depth Information

the waveform-coding target, which follows the harmonic mode. The remain-

ing phase discontinuities can be corrected within the first waveform-coding

frame, since SWPM keeps the phase discontinuities at a minimum and the

pitch periods are synchronized.

As a first approach the harmonic excitation is extended into the next frame

and the synthesized speech is linearly interpolated with the original speech

at the beginning of the frame in order to produce the waveform-coding

target. Listening tests were carried out with different interpolation lengths.

The waveform-coding target was not quantized, in order to isolate the

distortions due to switching. The tests were extended in order to understand

the audibility of the phase discontinuitieswith the frequency of the harmonics,

by manually shifting one phase at a time and synthesizing the rest of

the harmonics using the original phases. Phase shifts of π/ 2and π are

used. Listening tests show that for various interpolation lengths the phase

discontinuities below 1 kHz are audible, and an interpolation length as small

as 10 samples is sufficient to mask distortions in the higher frequencies.

Furthermore, male speech segments with long pitch periods, around 80

samples and above, do not cause audible switching artifacts. Male speech

segments with long pitch periods have well-resolved short-term and long-

term correlations, and produce clear and sharp pitch pulses, which can be

easily modeled by SWPM. Therefore only the harmonics below 1 kHz of the

segments with pitch periods shorter than 80 samples are considered in the

offset target modification process.

The harmonic excitation is extended beyond the mode transition frame

boundary, and the synthesized speech is generated in order to estimate the

harmonic phases at the mode transition frame boundary. The phase of the k th

harmonic of the excitation is computed as follows:

θ k i + 1 (n)

=

θ k i +

2 πkn/τ i

for 0

≤

n < N

(9.36)

where θ k i is the phase of the k th harmonic and τ i is the pitch at the end of

synthesis frame i . The excitation signal is given by,

K

a k i cos θ k i + 1 (n)

=

e h i + 1 (n)

(9.37)

k

=

1

where K is the number of harmonics and a k i is the amplitude of the k th

harmonic estimated at the end of the synthesis frame i . The excitation signal

is filtered through the LPC synthesis filter to produce the synthesized speech

signal, with the coefficients estimated at the end of the synthesis frame i .

The LPC memories after synthesizing the i th frame are used as the initial

memories. The speech samples synthesized for the i th and i

1 th frames are

+

Digital Speech: Coding for Low Bit Rate Communication Systems

Search WWH ::

Custom Search

Home