Audio Features - Intelligent Audio Analysis

Digital Signal Processing Reference

In-Depth Information

6.2.1.5 Linear Prediction

A simple model for the production of speech bases on the assumption that voiced

sounds—in particular vowels—can be well modelled by a few resonance frequencies,

which are referred to as formants [ 6 ]. Therefore, one can assume that subsequent

samples of a speech signal are not independent, but correlated to some degree, i.e.,

linear dependencies exist among consecutive frames [ 6 ]. By that, it should be possible

to predict a sample value s

by its predecessors [ 5 ].

Given a digital speech signal s

(

)

, we may assume the

long term average to equal zero [ 2 ]. To estimate and model the linear dependencies,

the method of Linear Predictive Coding (LPC) applies. The principle behind LPC

is a linear system, which describes an output value s

(

)

, with k from

−∞···+∞

as a weighted sum, i.e., as

linear combination of a limited number of preceding values s

(

)

(

−

)

[ 17 ]:

(

) =−

a i s

(

−

(6.31)

The minus sign is chosen to simplify further calculations. In practice, one can only

expect an error-prone estimation

(

)

of the actual value s

(

)

. The error e

(

)

between

these two is:

(

) =

(

) −ˆ

(

(6.32)

With Eq. ( 6.31 ):

(

) =−

a i s

(

−

) +

(

(6.33)

The weights a i are the so-called predictor coefficients. The summation delimiter p is

the order of the predictor. The predictor coefficients now have to be determined such

that—within a given interval—the values k conform well with the actual values of

, i.e., the prediction error is minimal. The optimisation criterion is the squared

error. In addition, the order p should be minimal in order to require as few coefficients

as possible [ 17 ]. Just like spectral parameters, the predictor coefficients need to be

computed for short segments, as speech signals vary over time.

It can be seen that the predictor polynomial represents a digital filter of the order p

which can be used either to produce the speech signal s

(

)

(

)

or the error signal e

(

)

using e

as input signal. The weights a i completely describe the according

linear system. If one uses the speech signal as input to the predictor, the system is a

digital transversal filter and one obtains the error signal:

(

)

or s

(

)

(

) =

(

) +

a i s

(

−

(6.34)

Intelligent Audio Analysis

Search WWH ::

Custom Search

Home