Pitch Estimation and Voiced–Unvoiced Classification of Speech - Digital Speech: Coding for Low Bit Rate Communication Systems

Digital Signal Processing Reference

In-Depth Information

segments do not exhibit such characteristics. In some parts of speech as well

as having the pitch period varying the speech may contain a mixture of voiced

(periodic) and unvoiced (random) signals which may cause estimation errors.

Formant interaction can also be a problem as the speech may become highly

resonant and this may cause incorrect pitch estimation. Onsets and offsets

are also problem areas. Finally, large amounts of background noise present

in the signal can also complicate the task of the PDA.

PDAs are generally classified in two main categories: time or frequency

domain techniques. However in the last few years more complicated tech-

niques which use both time and frequency domain characteristics of speech

have been developed. These are summarized below.

6.2.1 Time-DomainPDAs

The most obvious feature of periodic signals is the similarity of the waveform

at different times. The main principle of pitch detection algorithms (PDAs)

which rely on time-domain waveform similarities is to find the pitch period by

comparing the similarity between the original signal and its shifted version.

If the shifted distance is equal to the pitch period, the two signal waveforms

should have the greatest similarity. The majority of existing PDAs are based

on this concept. Among them, the average magnitude difference function

(AMDF) and the autocorrelation (AC) method are the two most widely

used.

Average Magnitude Difference Method

A simple way to compare the current speech with its time-delayed version is

to compute the average magnitude difference function (AMDF) [4] given by:

N

−

1

0 |

|

A(τ )

=

s(n)

−

s(n

−

τ)

(6.1)

n

=

where τ is the lag. This function is computed over a given pre-determined

range for τ and the value of τ minimizing A(τ ) is selected as the pitch

period. The value of N is typically 160 samples, corresponding to a 20 ms

speech frame. A plot of the AMDF function against the speech signal is

shown in Figure 6.1. The main advantage of the AMDF function is that

it only requires additions and subtractions, making it very suitable for

hardware implementation. However, current DSPs normally offer a one-cycle

multiply - add instruction, making this less significant. The performance of

the AMDF function is relatively poor and, in particular, it does not cater for

variations in the energy of the speech.

Digital Speech: Coding for Low Bit Rate Communication Systems

Search WWH ::

Custom Search

Home