Digital Signal Processing Reference
In-Depth Information
Table 2.3 Emotion classification performance using the spectral features obtained from entire
speech signal, adopting conventional block processing approach (ERSs 1-5)
Emotions
LPCCs
MFCCs
Formant
LPCCs
+
MFCCs
+
(ERS1)
(ERS2)
features
formant
formant
(ERS3)
features
features
(ERS4)
(ERS5)
Emotion recognition performance in %
Anger
53
57
33
60
63
Disgust
63
53
33
60
60
Fear
67
60
37
63
63
Happy
77
70
47
73
70
Neutral
80
77
66
83
77
Sadness
70
63
57
73
73
Sarcasm
73
70
64
77
60
Surprise
60
57
40
63
53
Average
68
63.38
47
69
68
ERS1 is developed using 13 LPCC features obtained from the entire speech signal
using the normal block processing approach. ERS2 is developed using 13 MFCC
features extracted frame wise from entire speech signal. ERS3 is developed using
formant related features. These 13 formant related features (4 frequencies, 4 energy
values, 4 bandwidth values and a slope), extracted per frame of 20 ms, are used
to represent formant information. Their concatenation forms the 13 dimensional
feature vector. The average emotion recognition performance using formant features
is about 47 %. ERS4 and ERS5 are developed using the combination of 13 formant
features along with 13 LPCCs and 13 MFCCs respectively. The dimension of the
resulting feature vectors is 26. Table 2.3 shows the emotion recognition performance
of ERS1-ERS5.
ERS6, ERS7, ERS8 and ERS9 are developed using the spectral features extracted
from the vowel regions of the utterances [ 30 ]. LPCCs, MFCCs and formant features
are extracted from 60 ms of speech signal, chosen from the steady portion of the
vowel region of each syllable. Table 2.4 shows the performance of ERSs 6-9.
Similar to the emotion recognition systems developed for vowel regions (ERS6-
ERS9), four systems are developed for consonant regions (ERS10-ERS13) and four
systems are developed for CV transition regions (ERS14-ERS17) [ 30 ]. Tables 2.5
and 2.6 show the emotion recognition performance of ERSs developed using LPCC
features, MFCC features, LPCCs
formant features,
extracted from consonant and CV transition regions of the syllables, respectively.
Pitch synchronously extracted spectral features are used to capture the finer level
spectral dynamics specific to the speech emotions. Therefore, spectral and formant
features, extracted from each pitch period are used to develop the ERSs 18-21.
Table 2.7 shows the emotion recognition performance using the spectral features
extracted through pitch synchronous analysis.
+
formant features and MFCCs
+
 
Search WWH ::




Custom Search