Analysis/Synthesis and Analysis by Synthesis Schemes - Introduction to Data Compression

Databases Reference

In-Depth Information

At the transmitter, a segment of speech is analyzed. The parameters obtained include a

decision as to whether the segment of speech is voiced or unvoiced, the pitch period if the

segment is declared voiced, and the parameters of the vocal tract filter. In this section, we will

take a somewhat detailed look at the various components that make up the linear predictive

coder. As an example, we will use the specifications for the 2.4-kbit U.S. Government standard

LPC-10.

The input speech is generally sampled at 8000 samples per second. In the LPC-10 standard,

the speech is broken into 180 sample segments, corresponding to 22.5 milliseconds of speech

per segment.

The Voiced/Unvoiced Decision

If we compare Figures 18.2 and 18.3 , we can see there are two major differences. Notice

that the samples of the voiced speech have larger amplitude; that is, there is more energy in

the voiced speech. Also, the unvoiced speech contains higher frequencies. As both speech

segments have average values close to zero, this means that the unvoiced speech waveform

crosses the x

0 line more often than the voiced speech sample. Therefore, we can get a

fairly good idea about whether the speech is voiced or unvoiced based on the energy in the

segment relative to background noise and the number of zero crossings within a specified

window. In the LPC-10 algorithm, the speech segment is first low-pass filtered using a filter

with a bandwidth of 1kHz. The energy at the output relative to the background noise is used to

obtain a tentative decision about whether the signal in the segment should be declared voiced

or unvoiced. The estimate of the background noise is basically the energy in the unvoiced

speech segments. This tentative decision is further refined by counting the number of zero

crossings and checking the magnitude of the coefficients of the vocal tract filter. We will talk

more about this latter point later in this section. Finally, it can be perceptually annoying to

have a single voiced frame sandwiched between unvoiced frames. The voicing decision of the

neighboring frames is considered in order to prevent this from happening.

=

Estimating the Pitch Period

Estimating the pitch period is one of the most computationally intensive steps of the analysis

process. Over the years a number of different algorithms for pitch extraction have been

developed. In Figure 18.2 , it would appear that obtaining a good estimate of the pitch should

be relatively easy. However, we should keep in mind that the segment shown in Figure 18.2

consists of 800 samples, which is considerably more than the samples available to the analysis

algorithm. Furthermore, the segment shown here is noise-free and consists entirely of a voiced

input. It can be a difficult undertaking for a machine to extract the pitch from a short noisy

segment that may contain both voiced and unvoiced samples.

Several algorithms make use of the fact that the autocorrelation of a periodic function

R xx (

will have a maximumwhen k is equal to the pitch period. Coupled with the fact that the

estimation of the autocorrelation function generally leads to a smoothing out of the noise, this

makes the autocorrelation function a useful tool for obtaining the pitch period. Unfortunately,

there are also some problems with the use of the autocorrelation. Voiced speech is not exactly

periodic, which makes the maximum lower than we would expect from a periodic signal.

k

)

Introduction to Data Compression

Search WWH ::

Custom Search

Home