Digital Signal Processing Reference
In-Depth Information
Figure 6.30 Reading from the top: St , Pr ,and Ps voicing parameters (dotted for
noisy speech), and the original and noisy speech waveforms
will be voiced and unvoiced? This leads to an adaptive mixed-voicing decision
process which has been used in MBE, MELP, etc.
6.3.2 Soft-DecisionVoicing
Although fully voiced and fully unvoiced frames can be identified in the
time domain by using the voicing parameters discussed above, in the case
of noisy speech this becomes more difficult and more mistakes are made.
In order to avoid this problem and to deal with the mixed frames in one
process, a frequency-domain voicing-decision process is more appropriate.
The mixed voicing-decision process usually makes use of the harmonic and
random structures of voiced and unvoiced sounds in the frequency domain.
For example, in MBE-based coders, a synthetic spectrum (constructed by
using the measured pitch of the frame) tests the degree of match with the
original spectrum. Better-matched frequencies are declared voiced and the
rest are classified as unvoiced. In the case of MELP, the input frame is first
split into subbands and the long-term correlation in each band is measured to
classify the band as voiced (high correlation) or unvoiced (low correlation).
Search WWH ::




Custom Search