Digital Signal Processing Reference
In-Depth Information
(a)
100
50
0
(b)
100
50
0
LPCC
MFCC
LPCC+Formants
MFCC+Formants
(c)
100
50
0
Entire speech
Vowel
Consonant
CV Transition
Pitch Synchronous
Fig. 2.12 Comparison of emotion recognition performance of different proposed spectral features
with respect to the entire speech, sub-syllabic regions, and pitch synchronous analysis on Set1, Set2,
and Set3 of IITKGP-SESC. a Emotion recognition performance using Set1, b Emotion recognition
performance using Set2, and c Emotion recognition performance using Set3
with Set3, emotion recognition results of Set1 and Set2 are given in Table 2.11 .
Recognition performance of Set2 data set is 2 % higher than the results of Set3 data
set. This is mainly due to the influence of speaker specific information during emo-
tion classification. In the case of Set1, the recognition performance is about 4-5 %
more than the results of Set3 and about 2-3 % more than the results of Set2. This
improvement in emotion recognition is due to text and speaker specific information.
The comparison of emotion recognition performance using proposed spectral fea-
tures on Set1, Set2, and Set3 of IITKGP-SESC is given in the bar graph shown in
Fig. 2.12 .
2.6 Summary
In this chapter, spectral features derived from sub-syllabic regions and pitch synchro-
nous analysis are proposed for recognizing the emotions from speech. IITKGP-SESC
and Emo-DB are used to carry out the emotion classification using the proposed spec-
tral features. LPCCs, MFCCs and formant features are used as features to represent
vocal tract information. Spectral features derived from sub-syllabic regions are inde-
pendently analyzed for classifying the emotions. It may be concluded from the results
that the entire speech signal may not be necessary to recognize underlying emotions.
Hence, the redundant information present in the steady region of the vowel may be
exempted from feature extraction. The spectral features extracted from CV transi-
tion regions have achieved the emotion recognition performance, almost comparable
with the performance obtained using the entire speech signal. Pitch synchronously
 
 
Search WWH ::




Custom Search