Pitch Estimation and Voiced–Unvoiced Classification of Speech - Digital Speech: Coding for Low Bit Rate Communication Systems

Digital Signal Processing Reference

In-Depth Information

Figure 6.30 Reading from the top: St , Pr ,and Ps voicing parameters (dotted for

noisy speech), and the original and noisy speech waveforms

will be voiced and unvoiced? This leads to an adaptive mixed-voicing decision

process which has been used in MBE, MELP, etc.

6.3.2 Soft-DecisionVoicing

Although fully voiced and fully unvoiced frames can be identified in the

time domain by using the voicing parameters discussed above, in the case

of noisy speech this becomes more difficult and more mistakes are made.

In order to avoid this problem and to deal with the mixed frames in one

process, a frequency-domain voicing-decision process is more appropriate.

The mixed voicing-decision process usually makes use of the harmonic and

random structures of voiced and unvoiced sounds in the frequency domain.

For example, in MBE-based coders, a synthetic spectrum (constructed by

using the measured pitch of the frame) tests the degree of match with the

original spectrum. Better-matched frequencies are declared voiced and the

rest are classified as unvoiced. In the case of MELP, the input frame is first

split into subbands and the long-term correlation in each band is measured to

classify the band as voiced (high correlation) or unvoiced (low correlation).

Search WWH ::

Custom Search

Home