Analysis/Synthesis and Analysis by Synthesis Schemes - Introduction to Data Compression

Databases Reference

In-Depth Information

Excitat ion

source

Vocal tr act

filter

Speech

F I GU R E 18 . 1

A model for speech synthesis.

limitations, it fits in with the techniques described in this chapter; that is, what is stored or

transmitted is not the samples of the source output, but a method for synthesizing the output.

We will study this approach in Section 18.6 .

18.3 Speech Compression

A very simplified model of speech synthesis is shown in Figure 18.1 . As we described in

Chapter 7, speech is produced by forcing air first through an elastic opening, the vocal cords,

and then through the laryngeal, oral, nasal, and pharynx passages, and finally through the

mouth and the nasal cavity. Everything past the vocal cords is generally referred to as the

vocal tract. The first action generates the sound, which is then modulated into speech as it

traverses through the vocal tract.

In Figure 18.1 , the excitation source corresponds to the sound generation, and the vocal tract

filter models the vocal tract. As we mentioned in Chapter 7, there are several different sound

inputs that can be generated by different conformations of the vocal cords and the associated

cartilages.

Therefore, in order to generate a specific fragment of speech, we have to generate a sequence

of sound inputs or excitation signals and the corresponding sequence of appropriate vocal tract

approximations.

At the transmitter, the speech is divided into segments. Each segment is analyzed to

determine an excitation signal and the parameters of the vocal tract filter. In some of the

schemes, a model for the excitation signal is transmitted to the receiver. The excitation signal

is then synthesized at the receiver and used to drive the vocal tract filter. In other schemes,

the excitation signal itself is obtained using an analysis-by-synthesis approach. This signal is

then used by the vocal tract filter to generate the speech signal.

Over the years many different analysis/synthesis speech compression schemes have been

developed, and substantial research into the development of new approaches and the improve-

ment of existing schemes continues. Given the large amount of information, we can only

sample some of the more popular approaches in this chapter. See [ 283 , 284 ] for more detailed

coverage and pointers to the vast literature on the subject.

The approaches we will describe in this chapter include channel vocoders , which are of

special historical interest; the linear predictive coder , which is the U.S. Government standard at

the rate of 2.4kbps; code-excited linear prediction (CELP) based schemes; sinusoidal coders ,

which provide excellent performance at rates of 4.8kbps and higher and are also a part of

several national and international standards; and mixed excitation linear prediction , which is

the new 2.4kbps federal standard speech coder. In our description of these approaches, we

will use the various national and international standards as examples.

Search WWH ::

Custom Search

Home