Digital Signal Processing Reference
In-Depth Information
Fig. 3.3 Emotion recognition system using sentence level global and local prosodic features
Table 3.7 Emotion classification performance using global prosodic features computed over entire
utterances
Emotions
Emotion recognition performance in %
Ang.
Dis.
Fear
Hap.
Neu.
Sad
Sar.
Sur.
Anger
28
17
23
3
13
13
3
0
Disgust
7
47
0
0
3
10
33
0
Fear
7
0
67
7
0
10
0
9
Happiness
3
0
7
14
43
3
10
20
Neutral
0
0
7
17
67
0
3
6
Sadness
7
3
17
17
0
40
13
3
Sarcasm
0
10
0
13
20
3
44
10
Surprise
7
0
17
13
3
3
13
44
Average recognition performance: 43.75
Ang. Anger, Dis. Disgust, Hap. Happiness, Neu. Neutral, Sar. Sarcasm, Sur. Surprise
as either neutral, fear, or happiness. The mis-classification due to static prosodic
features may be reduced by employing dynamic prosodic features for classifica-
tion. Therefore, use of dynamic nature of prosody contours, captured through local
prosodic features, is explored in this work for speech emotion recognition.
To study the relevance of individual local prosodic features in emotion recogni-
tion, three separate ER systems corresponding to sentence level duration, intonation
and energy patterns are developed to capture local emotion specific information.
Score level combination of these individual local prosodic systems is performed to
obtain overall emotion recognition performance due to all local utterance level fea-
tures. Emotion recognition performance using individual local prosodic features and
their score level combination is given in Table 3.8 . The average emotion recognition
performance due to individual local prosodic features is well above the performance
of global prosodic features. The information of pitch dynamics has the highest dis-
 
Search WWH ::




Custom Search