Summary and Conclusions - Robust Emotion Recognition Using Spectral and Prosodic Features

Digital Signal Processing Reference

In-Depth Information

words toward emotion specific knowledge is also systematically studied by develop-

ing the emotion recognition models using the features derived from different portions

of the speech utterances. In this work, SVM models are explored for capturing the

emotion discriminative information from the local and global prosodic features. From

the recognition studies using the proposed prosodic features, it is observed that local

prosodic features that represent the temporal variation in prosody have more discrim-

inative ability, while classifying the emotions. From the word level prosodic analysis,

it was observed that words in final position of the sentence have more emotion dis-

criminative characteristics, and they are almost capable of recognizing emotions at

par with sentence level prosodic features. From the syllable level prosodic analy-

sis, it was observed that initial and final syllables have more emotion discriminative

capacity than the middle syllables.

The source, system and prosodic features proposed in this topic may represent

different aspects of speech emotions. Hence, the combination of measures from the

proposed supplementary features is investigated to improve the emotion recognition

performance of the models. In this work all combinations of the proposed features

have enhanced the emotion recognition performance. It indicates that the proposed

features represent some non-overlapping emotion specific information. Among var-

ious combinations, the highest emotion recognition performance is observed when

all three features are combined [ 3 ].

A two-stage emotion recognition system has been proposed to improve the emo-

tion recognition performance further. At the first stage emotions are categorized into

three broad groups namely, active, passive and normal based on speaking rate. At

the second stage, finer classification is performed within each broad group. Here,

combinations of spectral and prosodic features are used for developing the emotion

models in both stages [ 4 ].

Proposed source, system and prosodic features are also explored to recognize real-

life natural emotions. Single and multi-speaker real life emotion speech databases are

collected from Hindi (Indian national language) movies. Excitation source, spectral

and prosodic features are independently, and in combination are used for natural

emotion recognition. The robust features presented in this work also perform better

in the case of real life emotions [ 5 , 6 ].

7.2 Contributions of the Present Work

Some of the primary contributions of this topic are:

Design and development of an emotional speech database in Telugu to promote

research on speech emotion processing in an Indian context. Design and develop-

ment of a Hindi movie database to represent real life-like emotions for modeling

naturalistic emotions.

Sub-syllabic and pitch synchronous spectral features are proposed for recognizing

the speech emotions.

Search WWH ::

Custom Search

Home