Digital Signal Processing Reference
In-Depth Information
Fig. 4.1 Emotion recognition system using the combination of source and spectral features
4.3 Emotion Recognition using Combination of Excitation
Source and Vocal Tract System Features
From the studies carried out in previous chapters, it is evident that excitation source
and vocal tract components of speech have certain emotion discriminative power.
These two features represent two different aspects of speech production mechanism.
Therefore, to exploit the unique emotion specific information provided by these
two features, their measures are combined through appropriate weighting factors.
The process of combination of measures is shown in Fig. 4.1 . The weighting rule for
combining the confidence scores (measure) of individual features is as follows: c f =
i = 1 w i c i , where c f is the combined confidence score derived from the confidence
scores of the individual features, w i and c i are weighting factor and confidence score
corresponding to the ith feature respectively, and m indicates the number of individual
features used for combining the scores. In this work, we have combined the measures
of the ER systems developed by two features, one of the weights ( w i )isvariedfrom
0 to 1, in steps of 0.1, and the other weights are determined using the formula:
w j =
1
w i
1. The emotion recognition performance obtained
using different combinations of weighting factors is given in Fig. 4.2 . It is observed
that, the best recognition performance is about 74% for the weighting factors of 0.6
and 0.4 to the confidence scores of system and source features respectively. From the
results (see Table 4.2 ), it is observed that recognition performance of anger, disgust,
fear, happiness, and sadness has improved in the combined system, whereas for
neutral, sarcasm and surprise there is not considerable improvement. The recognition
performance of the combined system is observed to be increased by around 4%
compared to the system developed using spectral features alone. The improvement
in the performance may be due to the supplementary nature of the measures provided
by the excitation source features. The details of the classification performance, using
the fusion of spectral and excitation source information are given in Table 4.2 .From
the table it may be noted that spectral features are more discriminative with respect
to emotions compared to excitation source features. Spectral and excitation source
features have performed similarly in the case of anger and fear. Comparison of
emotion recognition performance obtained using spectral, excitation source, and
spectral
1 , where j
=
2 and i
=
m
+
source features may be visualized from the bar graph shown in Fig. 4.3 .
 
 
Search WWH ::




Custom Search