Digital Signal Processing Reference
In-Depth Information
Fig. 3.2 General block architecture of an emotion recognition system using SVMs
negative examples. Positive feature vectors are derived from the utterances of the
intended emotion, and negative feature vectors are derived from the utterances of all
other emotions. Therefore, 8 SVMs are developed to represent 8 emotions. The basic
block diagram of the ER system developed using SVMs is shown in Fig. 3.2 .For
evaluating the performance of the ER systems, the feature vectors are derived from
the test utterances and are given as inputs to all 8 trained emotion models. The output
of each model is given to the decision module, where the category of the emotion is
hypothesized based on the highest evidence among the 8 emotion models.
For analyzing the effect of global and local prosodic features on emotion
recognition performance, separate models are developed using global and local
prosodic features [ 6 ]. The overall emotion recognition system consisting of com-
bination of measures from the global and local prosodic features is shown in Fig. 3.3 .
The emotion recognition system based on global prosodic features consists of
8 emotion models, developed by using 14-dimensional feature vectors (duration
parameters-2, pitch parameters-6, energy parameters-6). Emotion recognition per-
formance of the models using global prosodic features is given in Table 3.7 . Fear and
neutral are recognized with the highest rate of 67%, whereas happiness utterances
are identified with only 14% of accuracy. It is difficult to attain high performance,
while classifying the underlying speech emotions using only static prosodic features.
This is mainly due to the overlap of static prosodic features of different emotions.
For instance, it is difficult to discriminate pairs like fear and anger, sarcasm and dis-
gust using global prosodic features. Utterances of all 8 emotions are mis-classified
 
Search WWH ::




Custom Search