Information Technology Reference
In-Depth Information
20.2.1
Data Annotation
The audio files were next annotated with a twofold aim to (a) identify instances of
emotional behavior as expressed in the participants' conversation and subsequently
(b) assign an emotional label to them. The data annotation was performed by
an expert annotator. The selection of the appropriate utterances relied on the
annotator's perception of verbal and paralinguistic cues expressing the speakers'
sentiments and feelings. Specifically, the annotator's task was to detect units that
are emotionally colored, i.e., that deviate from a nonemotional and/or neutral way
of speaking in terms of either linguistic expressions or prosodic and paralinguistic
properties of speech (such as loudness, intensity, etc.) and carry information
about emotions the speakers are actually experiencing. These units may be of
varying lengths, i.e., interjections, words, phrases, or utterances. In this respect, the
conversational segments that are judged neutral or not emotionally colored by the
annotator were left unmarked and unlabeled.
The identified units were in turn annotated as positive or negative. This set of
values seems to be appropriate for the goals of the specific task, i.e., to describe the
attitude of the speakers toward each other as well as their evaluation on the provided
services or on the reported problems. It is important to assess in this domain, on
the one hand, whether the customers are eventually satisfied with the services they
get or seem to evaluate them negatively and, on the other hand, whether operators
express the intent to resolve problematic issues, soothe possible negative effects, or
are unable to fulfill the customers' requests.
In this sense, variation or scaling of similar emotions pertaining to either the
positive or the negative spectrum is considered to be grouped under one of the
two values. For example, no matter if speakers express helplessness, frustration,
or anger, the essential part is that they eventually express a negative stance; thus,
emotions of the aforementioned values will be labeled as negative.
A second reason for selecting this binary set of values was to avoid ambiguity
that is expected to affect the automatic processing phase as well as the evaluation
of the data by human judges. Specifically, the higher the number or the granularity
of the labels is, the more complex the recognition task becomes, especially when
certain labels are semantically close to each other.
Initially, the selected annotation labels consisted of a set of 25 categorical values
tailored to the needs of the call center domain and inspired by inventories of
categories representing emotions and related states as suggested in the EmotionML
(Schröder 2013 ; Schröder and Pelachaud 2012 ). In practice, annotating the data
with this fine-grained set of labels proved to be a hard task due to the difficulty to
assign an appropriate label to speech units showing relatively insufficient perceptual
cues in order to disambiguate between labels of semantically similar values. For
example, though it was easy to discern between units expressing opposite emotional
states, such as satisfaction and anger, there were lots of ambiguous units which were
perceptually considered representative of more than one single emotional label (e.g.,
anger/irritation/frustration).
Search WWH ::




Custom Search