Digital Signal Processing Reference
In-Depth Information
segments do not exhibit such characteristics. In some parts of speech as well
as having the pitch period varying the speech may contain a mixture of voiced
(periodic) and unvoiced (random) signals which may cause estimation errors.
Formant interaction can also be a problem as the speech may become highly
resonant and this may cause incorrect pitch estimation. Onsets and offsets
are also problem areas. Finally, large amounts of background noise present
in the signal can also complicate the task of the PDA.
PDAs are generally classified in two main categories: time or frequency
domain techniques. However in the last few years more complicated tech-
niques which use both time and frequency domain characteristics of speech
have been developed. These are summarized below.
6.2.1 Time-DomainPDAs
The most obvious feature of periodic signals is the similarity of the waveform
at different times. The main principle of pitch detection algorithms (PDAs)
which rely on time-domain waveform similarities is to find the pitch period by
comparing the similarity between the original signal and its shifted version.
If the shifted distance is equal to the pitch period, the two signal waveforms
should have the greatest similarity. The majority of existing PDAs are based
on this concept. Among them, the average magnitude difference function
(AMDF) and the autocorrelation (AC) method are the two most widely
used.
Average Magnitude Difference Method
A simple way to compare the current speech with its time-delayed version is
to compute the average magnitude difference function (AMDF) [4] given by:
N
1
0 |
|
A(τ )
=
s(n)
s(n
τ)
(6.1)
n
=
where τ is the lag. This function is computed over a given pre-determined
range for τ and the value of τ minimizing A(τ ) is selected as the pitch
period. The value of N is typically 160 samples, corresponding to a 20 ms
speech frame. A plot of the AMDF function against the speech signal is
shown in Figure 6.1. The main advantage of the AMDF function is that
it only requires additions and subtractions, making it very suitable for
hardware implementation. However, current DSPs normally offer a one-cycle
multiply - add instruction, making this less significant. The performance of
the AMDF function is relatively poor and, in particular, it does not cater for
variations in the energy of the speech.
Search WWH ::




Custom Search