Digital Signal Processing Reference
In-Depth Information
spectrum. However, as can be seen from Figure 4.11, after the LPC analysis
and inverse filtering there are still considerable variations in the spectrum,
i.e. it is far from white. Looking at the residual signal in Figure 4.10, it is
clear that long-term correlations, especially during voiced regions, still exist
between samples. The most evident of these are the sharp periodic pulses of
the excitation signal, which is hardly surprising as our original source-filter
model assumes this type of input signal. This also explains why the LPC
analysis, which models our vocal tract, cannot adequately remove them.
To remove the periodic structure of the residual or excitation signal, a
second stage of prediction is required. The objective of this second stage is
again to spectrally flatten our signal, i.e. to remove the periodic fine structure.
But unlike LPC analysis, it exploits correlation between speech samples that
are one or more 'pitch' periods away. For this reason, the pitch prediction
(filter) is usually called the long-term prediction (LTP) and the filter delay is
called the lag.
4.4.2 PitchPredictor (Filter)Formulation
Before discussing methods of pitch or long-term prediction, it is perhaps
worth considering what our objectives are. Our aim is to model the long-term
correlation left in the speech residual signal after LPC inverse filtering (or in
the original speech signal) such that when the model parameters are used
in a filter, it will remove the long-term correlation as much as possible, or
spectrally flatten our signal. There are no obvious reasons why we must
use the LPC residual and not the original signal to model the long-term
correlation in the speech signal, as long as the effects of the formants are
taken into account during the determination of the long-term delay (pitch) in
our model. Indeed, in Atal's original formulations in APC [10] (and in other
APC-related schemes), the pitch predictor was applied before the LPC. The
order in which they are combined is not too critical if the combination is
carefully optimized, e.g. block edge effects must be carefully compensated to
avoid 'clicking' type distortions. It is worth noting that the prediction gain of
the combined system will always be less than the sum of the gains in systems
employing the pitch and LPC filters in isolation. This is because in reality
the vocal tract and excitation are not completely separable as assumed in our
model, but are interconnected. The pitch filter can be interpreted as
1
=
P(z)
(4.53)
I
b j z (j + T)
1
j
=−
I
where T is the 'pitch period', and b j are the 'pitch gain' coefficients which
reflect the amount of correlation between the distant samples. Referring
Search WWH ::




Custom Search