Digital Signal Processing Reference
In-Depth Information
concatenated and windowed with a Kaiser window of 200 samples ( β
6 . 0)
centred at the frame boundary. The harmonic phases, ϕ k i , are estimated using
a 512 point FFT.
Having analysed the synthesized speech, the original speech is windowed
at three points: at the end of the synthesis frame i , at the centre of the synthesis
frame i
=
1, using the same window
function as before. The corresponding harmonic amplitudes, A k i , A k i + 1 / 2 , A k i + 1
and the phases φ k i , φ k i + 1 / 2 , φ k i + 1 are estimated using 512 point FFTs. Then
the signal component s l (n) , which consists of the harmonics below 1 kHz, is
synthesized by,
+
1, and at the end of the synthesis frame i
+
L
s l (n)
=
A k (n) cos ( k (n))
for 0
n < N
(9.38)
k
=
1
where L is the number of harmonics below 1 kHz at the end of the i th synthesis
frame, A k (n) is obtained by linear interpolation between A k i , A k i + 1 / 2 ,and A k i + 1 ,
and k (n) is obtained by cubic phase interpolation [2] between φ k i , φ k i + 1 / 2 ,
and φ k i + 1 . Then the signal s m (n) , which has modified phases is synthesized.
L
s m (n)
=
A k (n) cos ( k (n))
for 0
n < N
(9.39)
k
=
1
1 th
and, finally, the modified waveform-coding target of the i
+
synthesis
frame is computed by,
s t (n)
=
s (n)
s l (n)
+
s m (n)
(9.40)
where k (n) is obtained by cubic phase interpolation between ϕ k i and
φ k i + 1 . Thus the modified signal, s m (n) has the phases of the harmonically-
synthesized speech at the beginning of the frame and the phases of the original
speech at the end of the frame. In other words,
˙
k (n) (the rate of change
of each harmonic phase) is modified such that the phase discontinuities are
eliminated, by keeping ˙
k (n) equal to the harmonic frequencies at the frame
boundaries. There is a possibility that such phase modification operations
induce a reverberant character in the synthesized signals. However, large
phase mismatches close to π are rare, because SWPM minimizes the phase
discontinuities. Furthermore, the modifications are applied only for the
speech segments, which have pitch periods shorter than 80 samples, thus
a phase mismatch is smoothed out in a few pitch cycles. The listening
tests confirm that the synthesized speech does not possess a reverberant
character. Limiting the phase modification process for the segments with
pitch periods shorter than 80 samples also improves the accuracy of the
spectral estimations, which use a window length of 200 samples. Figure 9.16
Search WWH ::




Custom Search