Digital Signal Processing Reference
In-Depth Information
2.3.1 Linear Prediction Cepstral Coefficients (LPCCs)
The cepstral coefficients derived from either linear prediction (LP) analysis or a filter
bank approach are almost treated as standard front end features. Speech systems
developed based on these features have achieved a very high level of accuracy, for
speech recorded in a clean environment. Basically, spectral features represent pho-
netic information, as they are derived directly from spectra. The features extracted
from spectra, using the energy values of linearly arranged filter banks, equally empha-
size the contribution of all frequency components of a speech signal. In this context,
LPCCs are used to capture emotion-specific information manifested through vocal
tract features. In this work, the 10th order LP analysis has been performed, on the
speech signal, to obtain 13 LPCCs per speech frame of 20 ms using a frame shift
of 10 ms. The human way of emotion recognition depends equally on two factors,
namely: its expression by the speaker as well as its perception by a listener. The pur-
pose of using LPCCs is to consider vocal tract characteristics of the speaker, while
performing automatic emotion recognition [ 6 ].
Cepstrum may be obtained using linear prediction analysis of a speech signal.
The basic idea behind linear predictive analysis is that the n th speech sample can be
estimated by a linear combination of its previous p samples as shown in the following
equation.
s
(
n
)
a 1 s
(
n
1
) +
a 2 s
(
n
2
) +
a 3 s
(
n
3
) +···+
a p s
(
n
p
)
where a 1 ,
a 2 ,
a 3 ···
are assumed to be constants over a speech analysis frame. These
are known as predictor coefficients or linear predictive coefficients. These coefficients
are used to predict the speech samples. The difference of actual and predicted speech
samples is known as an error. It is given by
p
e
(
n
) =
s
(
n
) −ˆ
s
(
n
) =
s
(
n
)
a k s
(
n
k
)
k
=
1
where e
(
n
)
is the error in prediction, s
(
n
)
is the original speech signal,
s
ˆ
(
n
)
is a
predicted speech signal, a k s are the predictor coefficients.
To compute a unique set of predictor coefficients, the sum of squared differences
between the actual and predicted speech samples has been minimized (error mini-
mization) as shown in the equation below
s n (
2
p
E n =
m
)
a k s n (
m
k
)
m
k
=
1
where m is the number of samples in an analysis frame. To solve the above equation
for LP coefficients, E n has to be differentiated with respect to each a k and the result
is equated to zero as shown below
 
Search WWH ::




Custom Search