Analysis/Synthesis and Analysis by Synthesis Schemes - Introduction to Data Compression

Databases Reference

In-Depth Information

entry differs from the preceding entry in only two places. This structure helps reduce the

search complexity.

The adaptive codebook consists of the excitation vectors from the previous frame. Each

time a new excitation vector is obtained, it is added to the codebook.

In this manner, the

codebook adapts to local statistics.

The FS 1016 coder has been shown to provide excellent reproductions in both quiet and

noisy environments at rates of 4.8kbps and above [ 239 ]. Because of the richness of the

excitation signals, the reproduction does not suffer from the problem of sounding artificial.

The lack of a voicing decision makes it more robust to background noise. The quality of the

reproduction of this coder at 4.8kbps has been shown to be equivalent to a delta modulator

operating at 32kbps [ 239 ]. The price for this quality is much higher complexity and a much

longer coding delay. We will address this last point in the next section.

CCITT G.728 Speech Standard

By their nature, the schemes described in this chapter have some coding delay built into them.

By “coding delay,” we mean the time between when a speech sample is encoded to when

it is decoded if the encoder and decoder were connected back-to-back (i.e., there were no

transmission delays). In the schemes we have studied, a segment of speech is first stored in a

buffer. We do not start extracting the various parameters until a complete segment of speech

is available to us. Once the segment is completely available, it is processed. If the processing

is real time, this means another segment's worth of delay. Finally, once the parameters have

been obtained, coded, and transmitted, the receiver has to wait until at least a significant part

of the information is available before it can start decoding the first sample. Therefore, if a

segment contains 20 milliseconds' worth of data, the coding delay would be approximately

somewhere between 40 to 60 milliseconds. This kind of delay may be acceptable for some

applications; however, there are other applications where such long delays are not acceptable.

For example, in some situations there are several intermediate tandem connections between the

initial transmitter and the final receiver. In such situations, the total delay would be a multiple

of the coding delay of a single connection. The size of the delay would depend on the number

of tandem connections and could rapidly become quite large.

For such applications, CCITT approved recommendation G.728, a CELP coder with a

coder delay of 2 milliseconds operating at 16kbps. As the input speech is sampled at 8000

samples per second, this rate corresponds to an average rate of 2 bits per sample.

In order to lower the coding delay, the size of each segment has to be reduced significantly

because the coding delay will be some multiple of the size of the segment. The G.728 rec-

ommendation uses a segment size of five samples. With five samples and a rate of 2 bits per

sample, we only have 10 bits available to us. Using only 10 bits, it would be impossible to

encode the parameters of the vocal tract filter as well as the excitation vector. Therefore, the

algorithm obtains the vocal tract filter parameters in a backward adaptive manner; that is, the

vocal tract filter coefficients used to synthesize the current segment are obtained by analyzing

the previous decoded segments. The CCITT requirements for G.728 included the requirement

that the algorithm operate under noisy channel conditions. It would be extremely difficult

to extract the pitch period from speech corrupted by channel errors. Therefore, the G.728

algorithm does away with the pitch filter. Instead, the algorithm uses a 50th-order vocal tract

Introduction to Data Compression

Search WWH ::

Custom Search

Home