Multimode Speech Coding - Digital Speech: Coding for Low Bit Rate Communication Systems

Digital Signal Processing Reference

In-Depth Information

concatenated and windowed with a Kaiser window of 200 samples ( β

6 . 0)

centred at the frame boundary. The harmonic phases, ϕ k i , are estimated using

a 512 point FFT.

Having analysed the synthesized speech, the original speech is windowed

at three points: at the end of the synthesis frame i , at the centre of the synthesis

frame i

1, using the same window

function as before. The corresponding harmonic amplitudes, A k i , A k i + 1 / 2 , A k i + 1

and the phases φ k i , φ k i + 1 / 2 , φ k i + 1 are estimated using 512 point FFTs. Then

the signal component s l (n) , which consists of the harmonics below 1 kHz, is

synthesized by,

1, and at the end of the synthesis frame i

s l (n)

A k (n) cos ( k (n))

for 0

≤

n < N

(9.38)

where L is the number of harmonics below 1 kHz at the end of the i th synthesis

frame, A k (n) is obtained by linear interpolation between A k i , A k i + 1 / 2 ,and A k i + 1 ,

and k (n) is obtained by cubic phase interpolation [2] between φ k i , φ k i + 1 / 2 ,

and φ k i + 1 . Then the signal s m (n) , which has modified phases is synthesized.

s m (n)

A k (n) cos ( k (n))

for 0

≤

n < N

(9.39)

1 th

and, finally, the modified waveform-coding target of the i

synthesis

frame is computed by,

s t (n)

s (n)

−

s l (n)

s m (n)

(9.40)

where k (n) is obtained by cubic phase interpolation between ϕ k i and

φ k i + 1 . Thus the modified signal, s m (n) has the phases of the harmonically-

synthesized speech at the beginning of the frame and the phases of the original

speech at the end of the frame. In other words,

k (n) (the rate of change

of each harmonic phase) is modified such that the phase discontinuities are

eliminated, by keeping ˙

k (n) equal to the harmonic frequencies at the frame

boundaries. There is a possibility that such phase modification operations

induce a reverberant character in the synthesized signals. However, large

phase mismatches close to π are rare, because SWPM minimizes the phase

discontinuities. Furthermore, the modifications are applied only for the

speech segments, which have pitch periods shorter than 80 samples, thus

a phase mismatch is smoothed out in a few pitch cycles. The listening

tests confirm that the synthesized speech does not possess a reverberant

character. Limiting the phase modification process for the segments with

pitch periods shorter than 80 samples also improves the accuracy of the

spectral estimations, which use a window length of 200 samples. Figure 9.16

Digital Speech: Coding for Low Bit Rate Communication Systems

Search WWH ::

Custom Search

Home