Digital Signal Processing Reference
In-Depth Information
6.2.1.5 Linear Prediction
A simple model for the production of speech bases on the assumption that voiced
sounds—in particular vowels—can be well modelled by a few resonance frequencies,
which are referred to as formants [
6
]. Therefore, one can assume that subsequent
samples of a speech signal are not independent, but correlated to some degree, i.e.,
linear dependencies exist among consecutive frames [
6
]. By that, it should be possible
to predict a sample value
s
by its predecessors [
5
].
Given a digital speech signal
s
(
k
)
, we may assume the
long term average to equal zero [
2
]. To estimate and model the linear dependencies,
the method of Linear Predictive Coding (LPC) applies. The principle behind LPC
is a linear system, which describes an output value
s
(
k
)
, with
k
from
−∞···+∞
as a weighted sum, i.e., as
linear combination of a limited number of preceding values
s
(
k
)
(
k
−
i
)
[
17
]:
p
s
ˆ
(
k
)
=−
a
i
s
(
k
−
i
).
(6.31)
i
=
1
The minus sign is chosen to simplify further calculations. In practice, one can only
expect an error-prone estimation
s
ˆ
(
k
)
of the actual value
s
(
k
)
. The error
e
(
k
)
between
these two is:
e
(
k
)
=
s
(
k
)
−ˆ
s
(
k
).
(6.32)
With Eq. (
6.31
):
p
s
(
k
)
=−
a
i
s
(
k
−
i
)
+
e
(
k
).
(6.33)
i
=
1
The weights
a
i
are the so-called predictor coefficients. The summation delimiter
p
is
the order of the predictor. The predictor coefficients now have to be determined such
that—within a given interval—the values
k
conform well with the actual values of
s
, i.e., the prediction error is minimal. The optimisation criterion is the squared
error. In addition, the order
p
should be minimal in order to require as few coefficients
as possible [
17
]. Just like spectral parameters, the predictor coefficients need to be
computed for short segments, as speech signals vary over time.
It can be seen that the predictor polynomial represents a digital filter of the order
p
which can be used either to produce the speech signal
s
(
k
)
(
k
)
or the error signal
e
(
k
)
by
using
e
as input signal. The weights
a
i
completely describe the according
linear system. If one uses the speech signal as input to the predictor, the system is a
digital transversal filter and one obtains the error signal:
(
k
)
or
s
(
k
)
p
e
(
k
)
=
s
(
k
)
+
a
i
s
(
k
−
i
).
(6.34)
i
=
1