Digital Signal Processing Reference
In-Depth Information
The pitch is used to change the control flag for different reasons. For male
speech with long pitch periods, ACELP produces better quality than the
harmonic coders even at stationary voiced segments. When the pitch period
is long, ACELP needs fewer pulses in the time domain to track the changes
in the speech waveform while the harmonic coders have to encode a large
number of harmonics in the frequency domain. Furthermore, it is well-known
that speech-coding schemes which preserve the phase accurately work better
for male speech, while the harmonic coders which encode only the amplitude
spectrum result in better quality for female speech [24].
9.6.3 PlosiveDetection
The unvoiced synthesis process described in Section 9.5.5 updates the
unvoiced gain every 20ms. While this is sufficient for fricatives, it reduces the
quality of the highly nonstationary unvoiced components such as plosives.
The listening tests show that synthesizing plosives using ACELP preserves
the sharpness of the synthesized speech and improves the perceptual quality.
Therefore a special case is required to detect the plosives, which are classified
as unvoiced by the initial classification, and synthesize them using ACELP.
Plosives are characterized by isolated pulse-like signals with a sharp rise
in energy, and this feature is used to distinguish them from the fricatives.
The rms energy, e j , of the speech signal is computed for every 10 samples as
follows:
9
s 2 10 j
n
+
n
=
0
e j =
for 0
j < 15
(9.54)
10
A plosive detection metric, p j ,isdefinedas,
e j
8 e j 1
p j
=
(9.55)
where e
1 is the final energy term of the previous frame. A frame is classified
as containing a plosive if p j > 1 for at least one j . This algorithm may signal
a plosive even when the overall energy level is very low, for example at a
silence segment, if it detects a large fluctuation in energy. Those low-energy
segments are completely ignored when using the tracked energy term, t e ,in
the open-loop initial classification.
It should be noted that the scope of this plosive detection algorithm is
reduced to unvoiced segments, since the segments that include the unvoiced
plosives are already identified by the initial classification. The plosive detec-
Search WWH ::




Custom Search