Information Technology Reference
In-Depth Information
expressions and understandings of human behavior and respond to a growing need
of applications related to human behavior analysis (Narayanan and Gregoriou 2013 ).
For example, the telecommunications industry suffers from approximately 30 % of
churn rate, while it is of great importance to keep a high percentage of customer
retention (Jahromi et al. 2010 ). In this context, it is crucial to detect emotional traits
providing information about the speakers' intentions and emotional states. These
traits are multimodal, in the sense that they can be inferred from the paralinguistic
properties of the utterances, from the structural units of the interaction and their
flow, as well as from the linguistic content. At the same time, the perception of the
speakers' emotional states and the definition of the appropriate values describing
them are more than a trivial issue. This work focuses on most of the aforementioned
aspects, keeping aside for the time being the investigation of the linguistic content.
For the needs of our task, we extracted emotionally colored units from our call
center corpus. These units were in turn labeled by a human annotator with the values
of positive or negative. A large number of speech and other context-related features
were extracted for each unit to train mathematical models that can be used to predict
the label of an unseen emotional unit. A subset of the corpus was further annotated in
terms of turn management types, and the resulting annotations were then associated
to the emotional labels. In the next section, we describe the collected corpus and
the procedures applied to annotate it as well as a small-scale experiment aimed to
assess the perception of emotions from conversations extracted from this domain.
In Sect. 20.3 we describe the automatic feature extraction process and the machine
learning models proposed for the automatic classification tasks together with the
obtained results. Section 20.4 is dedicated to the turn management annotation
process, the study of overlapping speech in the corpus, and the association of
turn management values to the emotional ones. Finally Sect. 20.5 concludes the
presented work and provides future directions.
20.1.1
Related Work on Automatic Emotion
and Conflict Detection
A rich set of combined speech features has been investigated and it is often
employed, usually related to the temporal, prosodic, as well as the spectral content
of the speech signal, to capture any underlying emotional pattern reflected upon
these features (Schuller et al. 2010 , 2011 ; Morrison et al. 2007 ). Previous works
have studied emotion recognition and more specifically anger in speech (Neiberg
and Elenius 2008 ; Lee and Narayanan 2005 ; Burkhardt et al. 2009 ; Polzehl et al.
2011 ; Erden and Arslan 2011 ). On a task of discriminating five emotions (fear,
anger, sadness, neutral, and relief) in real-world audio data from a French human-
human call center corpus, an average detection rate of 45 % was reported with
only 107 acoustic features (Vidrascu and Devillers 2007 ). Support vector machines
(SVMs) and Gaussian mixture models (GMMs) have shown a reasonable accuracy
Search WWH ::




Custom Search