Digital Signal Processing Reference
In-Depth Information
envelope on the assumption that it is the gain response of a minimum phase
transfer function, and L v is the harmonic just below the voicing transition
frequency.
The upper part of the spectrum, which is declared as unvoiced, is synthe-
sized as follows:
K ω 0
A 0 exp jlφ 0 (n)
s 0
π , π ]
ˆ
s uv (n)
=
+
+
jU [
(8.17)
l
=
L v +
1
π ω 0 and U [
where K(ω 0 ) =
π , π ] denotes a uniformly distributed ran-
dom variable in the range
π and π . When a frame is fully unvoiced the
pitch estimate is meaningless and pitch frequencies greater than 150Hz may
degrade the perceptual quality of unvoiced speech. In order to synthesize
the noise-like unvoiced speech with adequate quality, the number of sinu-
soids with random phases should be sufficiently large. Therefore, the pitch
frequency is set to 100Hz for unvoiced speech. The synthesized speech of the
k th frame is then given by,
ˆ
s (n)
= ˆ
s v (n)
+ ˆ
s uv (n)
(8.18)
The overlap and add method is used with a triangular window to produce
the final speech output. Therefore, the frame length is equal to twice the
duration between the analysis points, i.e. N
2 N . The frequency response
=
of the spectral envelope is given by,
A (ω) exp s (ω)
H (ω)
=
(8.19)
which is approximated by an all-pole model,
g
H (ω) =
|
| =
for
z
1
(8.20)
p
a i z i
1
i
=
1
where g is the gain and a i are the predictor coefficients. The conventional
time-domain all-pole LPC analysis is performed on the original speech signal
and the maximum filter order is usually limited to half the smallest pitch
period. The limitation is imposed so that the LPCmodels the formant spectral
envelope, since LPC filters with a large number of taps tend to resolve the
harmonic structure. However in the case of STC, all-pole modelling is applied
to the estimated spectral envelope. Hence, the filter order is not restricted
and can be increased depending only on the desired accuracy of the spectral
Search WWH ::




Custom Search