Information Technology Reference
In-Depth Information
Table 20.4
Inter-annotator agreement between the Greek (left) and the Italian (right)
subjects
GR inter-annotator agreement
IT inter-annotator agreement
Positive
Negative
Neutral
Positive
Negative
Neutral
Positive
4.2 %
17.6 %
2.7 %
Positive
3.8 %
1.2 %
0
Negative
0.4 %
69.5 %
1.2 %
Negative
9.4 %
36.1 %
0
Neutral
0.1 %
4%
0.3 %
Neutral
24.2 %
25.2 %
0.1 %
The low agreement score (40 %) between the Italian annotators is mostly because
one of them used the “neutral” label frequently, while the other did not, as can be
shown in the detailed inter-annotator agreement scores in Table 20.4 . The “neutral”
label was mainly attributed to units whose paralinguistic properties could not drive
the annotator to infer whether those units have a positive or negative value. This
also explains why the neutral instances annotated by one of the Italian subjects are
equally attributed to a 50 % of positive and 50 % of negative labels (cf. Table 20.4 )
by the other Italian subject.
This pilot experiment suggests that paralinguistic cues are essential for the
perception of emotions in speech as well as that lexical or linguistic information
drastically improve the annotation's accuracy. Thus, these preliminary results show
that the decoding of positive and/or negative emotion in speech units largely depends
on the native language knowledge and the communication context. Native speakers
seem to be favored in comparison to the nonnative ones because of their ability
to infer linguistic and the semantic contents in addition with the exploitation of
prosodic and paralinguistic information. This assumption, however, needs to be
verified by further experimentation including more elaborate conditions as well as
an adequate number of nonnative subjects.
20.3
Automatic Emotion Classification Experiments
In order to automatically classify the data at hand, the speech units were shuffled and
grouped into a training (TR) and a testing (TE) set, respectively, in such a way that
the resulting sets refer to disjoint speakers. Also, to avoid bias toward one or another
category during the training and the testing phases, the corpus splitting resulted in
parts that contain a similar proportion of positive/negative, operator/customer, and
male/female speech units (cf. Table 20.5 ).
The TR set (1,150 units) was used for training two different machine learning
algorithms to discriminate between emotionally positive and negative speech units.
The TE set (246 units) was used for assessing the algorithms' performance on
unseen positive and negative speech examples.
 
Search WWH ::




Custom Search