Analysis by Synthesis LPC Coding - Digital Speech: Coding for Low Bit Rate Communication Systems

Digital Signal Processing Reference

In-Depth Information

where a i are the LPC coefficients and p is the filter order. It is made time-

varying to reflect the change in the speech spectrum with adaptation rates of

typically around 20-30ms. The order of the filter, p , is usually chosen to be

around 8 to 12.

The pitch filter models the long-term correlation in speech (the fine spectral

structure) and is given by,

1

P(z) =

1

(7.2)

I

b i z − (D + i)

1

−

i

=−

I

where D is a pointer to long-term correlation which usually corresponds

to the pitch period or its multiples and b i are the pitch (or LTP) gain

coefficients. Again, this is a time-varying filter but it usually has higher

adaptation rates than the LPC, e.g. 5-10ms. The number of filter taps typ-

ically takes the form I

1, i.e. 3 taps. Note that

because of the recursive nature of the two filters, both contain memory in

their working buffers carried over from the previous frame of analysis.

The preservation and inclusion of this filter memory in the AbS analysis is

very important as it reflects the past history of the analysis, and includes

any errors incurred in the previous frames. Also, it provides a smooth-

ing effect to the distortions caused by the block-oriented analysis, such as

edge effects.

=

0, i.e. 1 tap, and I

=

7.2.2 Perceptually-basedMinimizationProcedure

The AbS-LPC coder of Figure 7.2 minimizes the error between the orig-

inal s(n) and the synthesized signal

s(n) according to a suitable error

criterion, by varying the excitation signal and the LPC and pitch fil-

ters. As described earlier, this is achieved via a sequential procedure.

First the time-varying filter parameters are determined, then the excitation

is optimized.

The optimization criterion used for both procedures is the commonly

used mean squared error, which offers simplicity and adequate performance.

However, at low bit-rates there is one or fewer bit per sample coding capacity,

thus it is more difficult tomatch the waveform closely than in, say, higher than

16 kb/s schemes, where more than 1 bit/sample is available. Consequently,

the mean squared error between the original and reconstructed signal is less

meaningful and less than adequate. What is required is an error criterion

which is more in sympathy with human perception. Although much work

on auditory perception is in progress, no satisfactory error criterion has yet

emerged. In the meantime, however, a popular but not totally satisfactory

ˆ

Digital Speech: Coding for Low Bit Rate Communication Systems

Search WWH ::

Custom Search

Home