MFCC Features - Robust Emotion Recognition Using Spectral and Prosodic Features - page 108

Digital Signal Processing Reference

In-Depth Information

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

0

500

1000

1500

2000

2500

3000

3500

4000

Frequency (Hz)

Fig. A.1

Mel-filter bank

8

<

0 ;

k\f ð m 1 Þ

ð ð Þ

f ð m Þ f ð m 1 Þ ;

2 k fm 1

f ð m 1 Þ k f ð m Þ

H m ð k Þ¼

ð A : 5 Þ

:

2 f ð m þ 1 Þð Þ

f ð m þ 1 Þ f ð m Þ ;

f ð m Þ \k f ð m þ 1 Þ

0 ;

k [ f ð m þ 1 Þ

with m ranging from 0 to M 1.

5. Discrete Cosine Transform (DCT): Since the vocal tract is smooth, the energy

levels in adjacent bands tend to be correlated. The DCT is applied to the

transformed mel frequency coefficients produces a set of cepstral coefficients.

Prior to computing DCT the mel spectrum is usually represented on a log scale.

This results in a signal in the cepstral domain with a que-frequency peak

corresponding to the pitch of the signal and a number of formants representing

low que-frequency peaks. Since most of the signal information is represented

by the first few MFCC coefficients, the system can be made robust by extracting

only those coefficients ignoring or truncating higher order DCT components

[ 1 ]. Finally, MFCC is calculated as [ 1 ]

;

c ð n Þ¼ X

M 1

pn ð m 0 : 5 Þ

M

log 10 s ð m Þ

ð

Þ cos

n ¼ 0 ; 1 ; 2 ; :::; C 1

ð A : 6 Þ

m ¼ 0

where c ð n Þ are the cepstral coefficients and C is the number of MFCCs.

Traditional MFCC systems use only 8-13 cepstral coefficients. The zeroth

Next Page

Robust Emotion Recognition Using Spectral and Prosodic Features

Search WWH ::

Custom Search

Home