Digital Signal Processing Reference
In-Depth Information
Table 3.9 Emotion classification performance using the combination of local and global prosodic
features computed from entire utterances
Emotions
Emotion recognition performance in %
Anger Disgust
Fear Happiness Neutral
Sadness
Sarcasm Surprise
Anger
47
40
3
0
3
0
7
0
Disgust
10
63
0
0
7
10
10
0
Fear
7
0
60
3
0
13
0
17
Happiness
10
0
7
53
20
0
3
7
Neutral
0
0
3
7
74
13
3
0
Sadness
0
0
17
3
0
77
3
0
Sarcasm
0
10
0
0
3
0
84
3
Surprise
0
0
10
17
0
0
6
67
Average recognition performance: 65.63%
Ang. Anger, Dis. Disgust, Hap. Happiness, Neu. Neutral, Sar. Sarcasm, Sur. Surprise
performance further. Table 3.9 shows the recognition performance of the emotion
recognition system developed by combining the measures from global and local
prosodic features.
The average emotion recognition performance after combining the global and
local prosodic features is observed to be about 65.63%. There is no considerable
improvement in the emotion recognition rate, by combining themeasures fromglobal
and local prosodic features. This indicates that the emotion discriminative properties
of global prosodic feature are not complementary to those of local features. There-
fore, local prosodic features alone would be sufficient to perform speech emotion
recognition. The comparison of recognition performance in case of each emotion,
with respect to the global, local and their combination of features is shown in Fig. 3.4 .
It may be observed from the figure that anger, neutral, sadness, and surprise have
achieved better discrimination using the combination of global and local prosodic
features. Local prosodic features play an important role in the discrimination of
disgust, happiness, and sarcasm. Fear is recognized well by using global prosodic
features.
3.5.2 Emotion Recognition Systems using Word Level
Prosodic Features
In general, while expressing emotions, different emotions appear to be effective
at different parts of the utterances. For example, anger and happiness show their
characteristics mainly at the beginning of the utterance. Effective expression of fear
and disgust may be observed in the final part of the utterance. Based on this intuitive
hypothesis, initial, middle, and final portions of the utterances are analyzed separately
for capturing the emotion specific information. To analyze characteristics of emotions
at different parts of the utterance, each utterance of IITKGP-SESC, is divided into
 
 
Search WWH ::




Custom Search