Digital Signal Processing Reference
In-Depth Information
where a i are the LPC coefficients and p is the filter order. It is made time-
varying to reflect the change in the speech spectrum with adaptation rates of
typically around 20-30ms. The order of the filter, p , is usually chosen to be
around 8 to 12.
The pitch filter models the long-term correlation in speech (the fine spectral
structure) and is given by,
1
P(z) =
1
(7.2)
I
b i z (D + i)
1
i
=−
I
where D is a pointer to long-term correlation which usually corresponds
to the pitch period or its multiples and b i are the pitch (or LTP) gain
coefficients. Again, this is a time-varying filter but it usually has higher
adaptation rates than the LPC, e.g. 5-10ms. The number of filter taps typ-
ically takes the form I
1, i.e. 3 taps. Note that
because of the recursive nature of the two filters, both contain memory in
their working buffers carried over from the previous frame of analysis.
The preservation and inclusion of this filter memory in the AbS analysis is
very important as it reflects the past history of the analysis, and includes
any errors incurred in the previous frames. Also, it provides a smooth-
ing effect to the distortions caused by the block-oriented analysis, such as
edge effects.
=
0, i.e. 1 tap, and I
=
7.2.2 Perceptually-basedMinimizationProcedure
The AbS-LPC coder of Figure 7.2 minimizes the error between the orig-
inal s(n) and the synthesized signal
s(n) according to a suitable error
criterion, by varying the excitation signal and the LPC and pitch fil-
ters. As described earlier, this is achieved via a sequential procedure.
First the time-varying filter parameters are determined, then the excitation
is optimized.
The optimization criterion used for both procedures is the commonly
used mean squared error, which offers simplicity and adequate performance.
However, at low bit-rates there is one or fewer bit per sample coding capacity,
thus it is more difficult tomatch the waveform closely than in, say, higher than
16 kb/s schemes, where more than 1 bit/sample is available. Consequently,
the mean squared error between the original and reconstructed signal is less
meaningful and less than adequate. What is required is an error criterion
which is more in sympathy with human perception. Although much work
on auditory perception is in progress, no satisfactory error criterion has yet
emerged. In the meantime, however, a popular but not totally satisfactory
ˆ
Search WWH ::




Custom Search