Digital Signal Processing Reference
In-Depth Information
Keywords Berlin dataset • Mel frequency cepstral coefficients (MFCC) •
Multilayer perceptron (MLP) • NAW dataset • Speech emotion profiling • Speech
emotion recognition
2.1
Introduction
During the last century, many researchers from different disciplines have tried to
postulate a few basic emotions out of the entire range of emotions that are tinged
and enlivened. One of the models suggests that every emotion is composed of
different levels of certain basic components including arousal, intensity, aversion,
self-directedness, and others. Among many models, the prevailing one conjectures
that emotions arise much the same way as colors do - presenting a myriad of hues
out of the basic few constituents [ 7 ] To date, cognitive science does not possess a
test to decide between various competing models of the basic emotion. However,
researchers in various disciplines agree that some emotions are universally accepted
as basic and many others as secondary. Cornelius has labeled six emotions as the
“Big Six” [ 11 ], which are angry , happy , sad , fear , surprise , and disgust . These were
chosen in this study. However, we focus only on angry , sad , and happy emotions
with neutral as emotionless state in this chapter.
Emotion recognition from engineering perspective is a fairly new field of
research compared to the psychologists' community. With the understanding that
human convey and perceive underlying emotion in the interaction, scientists and
researchers are able to analyze massive amount of information transmitted from a
speaker to the listener using the tools of signal processing today. Yet, we are
struggling to understand emotion and, more critically, capture and/or process it in
a form that is useful for technical purposes.
In 2001, Sherer et al. have conducted a study in nine different countries in
Europe, United States, and Asia on vocal emotion portrayals using content-free
sentences containing anger, sadness, fear, joy, and neutral voice [ 9 ]. They found
that generally the accuracy decreased with increasing language dissimilarity in
spite of the use of language-free speech samples. It is concluded that culture-and
language-specific paralinguistic patterns may influence the emotion recognition
process.
In this chapter, we address this issue by proposing Mel Frequency Cepstral
Coefficients (MFCC) as our features for speech emotion recognition. Our feature
extractionmethod based on Slaney's [ 8 ] approach coupled with theWEKAmultilayer
perceptron (MLP) [ 12 ] classifier. These are adopted to identify the three basic
emotions, namely, angry , sad ,and happy emotional states. Initially, two different
speech emotion datasets - using the NAW dataset (American actors) and Berlin
dataset (German actors) - were employed to train and test the accuracy of the proposed
systembased on the K-fold validation technique. Next, we have extended our scope by
using speech data recorded while driving in real time, to analyze and understand the
driver behavior [ 6 ]. The driver was asked to interact with the passenger as well as
Search WWH ::




Custom Search