Digital Signal Processing Reference
In-Depth Information
concatenated and windowed with a Kaiser window of 200 samples (
β
6
.
0)
centred at the frame boundary. The harmonic phases,
ϕ
k
i
, are estimated using
a 512 point FFT.
Having analysed the synthesized speech, the original speech is windowed
at three points: at the end of the synthesis frame
i
, at the centre of the synthesis
frame
i
=
1, using the same window
function as before. The corresponding harmonic amplitudes,
A
k
i
,
A
k
i
+
1
/
2
,
A
k
i
+
1
and the phases
φ
k
i
,
φ
k
i
+
1
/
2
,
φ
k
i
+
1
are estimated using 512 point FFTs. Then
the signal component
s
l
(n)
, which consists of the harmonics below 1 kHz, is
synthesized by,
+
1, and at the end of the synthesis frame
i
+
L
s
l
(n)
=
A
k
(n)
cos
(
k
(n))
for 0
≤
n < N
(9.38)
k
=
1
where
L
is the number of harmonics below 1 kHz at the end of the
i
th
synthesis
frame,
A
k
(n)
is obtained by linear interpolation between
A
k
i
,
A
k
i
+
1
/
2
,and
A
k
i
+
1
,
and
k
(n)
is obtained by cubic phase interpolation [2] between
φ
k
i
,
φ
k
i
+
1
/
2
,
and
φ
k
i
+
1
. Then the signal
s
m
(n)
, which has modified phases is synthesized.
L
s
m
(n)
=
A
k
(n)
cos
(
k
(n))
for 0
≤
n < N
(9.39)
k
=
1
1
th
and, finally, the modified waveform-coding target of the
i
+
synthesis
frame is computed by,
s
t
(n)
=
s (n)
−
s
l
(n)
+
s
m
(n)
(9.40)
where
k
(n)
is obtained by cubic phase interpolation between
ϕ
k
i
and
φ
k
i
+
1
. Thus the modified signal,
s
m
(n)
has the phases of the harmonically-
synthesized speech at the beginning of the frame and the phases of the original
speech at the end of the frame. In other words,
˙
k
(n)
(the rate of change
of each harmonic phase) is modified such that the phase discontinuities are
eliminated, by keeping
˙
k
(n)
equal to the harmonic frequencies at the frame
boundaries. There is a possibility that such phase modification operations
induce a reverberant character in the synthesized signals. However, large
phase mismatches close to
π
are rare, because SWPM minimizes the phase
discontinuities. Furthermore, the modifications are applied only for the
speech segments, which have pitch periods shorter than 80 samples, thus
a phase mismatch is smoothed out in a few pitch cycles. The listening
tests confirm that the synthesized speech does not possess a reverberant
character. Limiting the phase modification process for the segments with
pitch periods shorter than 80 samples also improves the accuracy of the
spectral estimations, which use a window length of 200 samples. Figure 9.16
Search WWH ::
Custom Search