Speech Signal Analysis and Modelling - Digital Speech: Coding for Low Bit Rate Communication Systems

Digital Signal Processing Reference

In-Depth Information

spectrum. However, as can be seen from Figure 4.11, after the LPC analysis

and inverse filtering there are still considerable variations in the spectrum,

i.e. it is far from white. Looking at the residual signal in Figure 4.10, it is

clear that long-term correlations, especially during voiced regions, still exist

between samples. The most evident of these are the sharp periodic pulses of

the excitation signal, which is hardly surprising as our original source-filter

model assumes this type of input signal. This also explains why the LPC

analysis, which models our vocal tract, cannot adequately remove them.

To remove the periodic structure of the residual or excitation signal, a

second stage of prediction is required. The objective of this second stage is

again to spectrally flatten our signal, i.e. to remove the periodic fine structure.

But unlike LPC analysis, it exploits correlation between speech samples that

are one or more 'pitch' periods away. For this reason, the pitch prediction

(filter) is usually called the long-term prediction (LTP) and the filter delay is

called the lag.

4.4.2 PitchPredictor (Filter)Formulation

Before discussing methods of pitch or long-term prediction, it is perhaps

worth considering what our objectives are. Our aim is to model the long-term

correlation left in the speech residual signal after LPC inverse filtering (or in

the original speech signal) such that when the model parameters are used

in a filter, it will remove the long-term correlation as much as possible, or

spectrally flatten our signal. There are no obvious reasons why we must

use the LPC residual and not the original signal to model the long-term

correlation in the speech signal, as long as the effects of the formants are

taken into account during the determination of the long-term delay (pitch) in

our model. Indeed, in Atal's original formulations in APC [10] (and in other

APC-related schemes), the pitch predictor was applied before the LPC. The

order in which they are combined is not too critical if the combination is

carefully optimized, e.g. block edge effects must be carefully compensated to

avoid 'clicking' type distortions. It is worth noting that the prediction gain of

the combined system will always be less than the sum of the gains in systems

employing the pitch and LPC filters in isolation. This is because in reality

the vocal tract and excitation are not completely separable as assumed in our

model, but are interconnected. The pitch filter can be interpreted as

1

=

P(z)

(4.53)

I

b j z − (j + T)

1

−

j

=−

I

where T is the 'pitch period', and b j are the 'pitch gain' coefficients which

reflect the amount of correlation between the distant samples. Referring

Digital Speech: Coding for Low Bit Rate Communication Systems

Search WWH ::

Custom Search

Home