Digital Signal Processing Reference
In-Depth Information
1.0
Voicing likelihood
Threshold function
Speech spectrum
0.5
0.0
0.5
0
1000
2000
3000
4000
Frequency (Hz)
Figure 6.34 Original speech spectrum with voicing likelihood and threshold func-
tion; the voicing cut-off frequency is indicated by the vertical dashed line
into account the difference between V(l) (the voicing likelihood) and the
threshold T(l) , which replaces the hard decision used in MBE mixed-voicing
with a soft decision in each band. An example of a voicing likelihood and
threshold function is shown in Figure 6.34.
It is also possible to use this weighted-sum approach on the voicing measure
used in MBE. However, the MBE approach requires the computation and
generation of a synthetic spectrum, as described above. This is not required for
the voicing likelihood method discussed here. However, as for the MBE and
MELP voicing-decision algorithms, the most important stage during split-
band voicing estimation is the calculation of the threshold function. Using
a limited number of speech characteristics for the threshold computation
does not lead to good voicing determination. For example, the energy alone
is not a reliable enough voicing indicator, since there can be high-energy
unvoiced speech segments and low-level voiced speech. The peakiness factor
is not entirely reliable either: single spikes can lead to high peakiness, but
they should be declared as unvoiced for optimal speech quality. Likewise,
the periodic similarity measure has its limits: when the pitch varies, the
Search WWH ::




Custom Search