Digital Signal Processing Reference
In-Depth Information
Table 1.2
Continued
The combination of evidences from different features is proved to perform better for many
speech tasks
This may be due to supplementary or complementary evidences provided by different
features. Hence, in this work the following combination of features may be explored to
study emotion recognition performance
−
Excitation and spectral features
−
Spectral and prosodic features
−
Excitation and prosodic features
−
Excitation, spectral and prosodic features
Multilevel classification systems provide better classification over single level classification
−
Two level emotion classification system is proposed
−
At the first level all the emotions are divided into few broad groups, where similar
(confusable) emotions are placed in different groups
−
At the second level emotions in broad groups are further classified
The ultimate goal of any speech emotion recognition system is to process real life emotions
−
Combination of different features may be used for real-life emotion recognition
−
Hindi movie clips may be used to represent real life emotions
syllabic Spectral Features
provides the details about the extraction of spectral
features from sub-syllabic regions such as consonant, vowel, and consonant-vowel
(CV) transition regions. Extraction of spectral features from pitch synchronous
analysis is also explained. Development of emotion recognition systems using
Gaussian mixture models is discussed.
Level Prosodic Features
discusses in detail about the use of global and local
prosodic features for developing emotion recognition systems. Global (static) and
local (dynamic) prosodic features extracted from sentences, words, and syllables
are proposed for classifying the speech emotions. The contribution of prosodic
features from different speech regions (initial, middle, and final) is also analyzed
using local and global features. For capturing emotion-specific prosody from the
proposed features, support vector machine models are used.
Source, Spectral and Prosodic Features
discusses the combination of com-
plementary and supplementary evidences provided by the source, system, and
prosodic features for improving the emotion recognition performance. This chapter
provides emotion recognition performance studies for various combinations of
features. Here, evidences from various features are combined using optimal linear
weighted combination scheme.
discusses the development of two stage emotion recognition system based on
speaking rate. In this case initially emotions are classified into broad categories
Search WWH ::
Custom Search