Digital Signal Processing Reference
In-Depth Information
90
80
70
60
50
40
30
20
Source
System
Prosodic
Sour+Sys+Pros
10
0
Anger
Disgust
Fear
Happy
Neutral
Sad
Sarcastic
Surprise
Emotons
Fig. 4.9 Comparison of emotion recognition performance with respect to each emotion using
excitation source, spectral, prosodic, and source + spectral + prosodic features derived on IITKGP-
SESC
like neutral, sadness and sarcasm are recognized well by spectral features. Surprise
is a exception for the above trend. The reason for good recognition performance
of spectral features for slow emotions may be due to the availability of sufficient
time for the vocal tract in taking specific shape while expressing the emotions. It is
quite interesting to know that, after combination of the features, the performance is
better than the best among the three individuals for all emotions, except for disgust.
It indicates that all three components of speech provide supplementary contribution
toward emotion recognition. Visualization of the above discussion of results may be
realized through the bar graph given in Fig. 4.9 .
The Berlin emotion speech corpus is widely used by many researchers for com-
paring their proposed features and methods used for classifying the emotions. In
this study, the combination of excitation source, vocal tract (VT) system, and local
prosodic features is used to study the emotion classification on the Berlin emo-
tion database (Emo-DB). Out of 10 speakers, the speech data of 8 speakers is
used for training the models, the remaining 2 speakers' speech data is used for
testing the developed models. The emotion recognition performance of individual
and combination of features is given in Table 4.6 . From the results of Table 4.6 ,
it is observed that there is 15% of improvement in the emotion recognition per-
formance by combining the source, system and prosodic features derived from
the Berlin emotional speech corpus (Emo-DB). From the results of Tables 4.5
and 4.6 , it is observed that the ER performance using combinations of features
derived from IITKGP-SESC and Emo-DB is almost same. The bar graph given
in Fig. 4.10 compares the emotion-wise recognition performance of the emotions
present in Emo-DB using source, system, prosodic, and source
+
system
+
prosodic
features.
 
Search WWH ::




Custom Search