Digital Signal Processing Reference
In-Depth Information
words toward emotion specific knowledge is also systematically studied by develop-
ing the emotion recognition models using the features derived from different portions
of the speech utterances. In this work, SVM models are explored for capturing the
emotion discriminative information from the local and global prosodic features. From
the recognition studies using the proposed prosodic features, it is observed that local
prosodic features that represent the temporal variation in prosody have more discrim-
inative ability, while classifying the emotions. From the word level prosodic analysis,
it was observed that words in final position of the sentence have more emotion dis-
criminative characteristics, and they are almost capable of recognizing emotions at
par with sentence level prosodic features. From the syllable level prosodic analy-
sis, it was observed that initial and final syllables have more emotion discriminative
capacity than the middle syllables.
The source, system and prosodic features proposed in this topic may represent
different aspects of speech emotions. Hence, the combination of measures from the
proposed supplementary features is investigated to improve the emotion recognition
performance of the models. In this work all combinations of the proposed features
have enhanced the emotion recognition performance. It indicates that the proposed
features represent some non-overlapping emotion specific information. Among var-
ious combinations, the highest emotion recognition performance is observed when
all three features are combined [ 3 ].
A two-stage emotion recognition system has been proposed to improve the emo-
tion recognition performance further. At the first stage emotions are categorized into
three broad groups namely, active, passive and normal based on speaking rate. At
the second stage, finer classification is performed within each broad group. Here,
combinations of spectral and prosodic features are used for developing the emotion
models in both stages [ 4 ].
Proposed source, system and prosodic features are also explored to recognize real-
life natural emotions. Single and multi-speaker real life emotion speech databases are
collected from Hindi (Indian national language) movies. Excitation source, spectral
and prosodic features are independently, and in combination are used for natural
emotion recognition. The robust features presented in this work also perform better
in the case of real life emotions [ 5 , 6 ].
7.2 Contributions of the Present Work
Some of the primary contributions of this topic are:
￿
Design and development of an emotional speech database in Telugu to promote
research on speech emotion processing in an Indian context. Design and develop-
ment of a Hindi movie database to represent real life-like emotions for modeling
naturalistic emotions.
￿
Sub-syllabic and pitch synchronous spectral features are proposed for recognizing
the speech emotions.
 
Search WWH ::




Custom Search