From Annotation to Multimodal Behavior - Coverbal Synchrony in Human-Machine Interaction

Graphics Reference

In-Depth Information

multimodal video analysis either ELAN (Elan) or Anvil (Kipp, 2001)

annotation tools are commonly used. The increase in the larger

corpora of annotated multimodal data has raised questions about

creating, coding, processing, and managing more extensive multimodal

resources, notably in the context of European collaboration projects.

For instance, the European Telematics project MATE (Multilevel

Annotation Tools Engineering) aimed to facilitate the use and reuse of

spoken language resources, coding schemes and tools, and produced

the NITE workbench (Bernsen et al., 2002) that addresses theoretical

issues. Dybkjaer et al. (2002) provided an overview of the tools and

standards for multimodal annotation.

2.5 Inter-coder agreement

A number of methodological recommendations have been put

forward for validating the data and ensuring coherent and reliable

annotations. The annotators' mutual agreement of the categories is

one of the standard measures, and much attention has been devoted

to it (Cavicchio and Poesio, 2009; Rietveld and van Hout, 1993). It

is important to distinguish the percentage agreement (how many

times the annotators are observed to assign the same category to the

annotation elements), from the agreement that takes into account

expected agreement (the probability that the annotators agree by

chance). Agreement beyond chance can be measured by Cohen's kappa,

coefficient M, calculated as follows:

M = ( P ( A ) - P ( E ))/(1 - P ( E ))

where P ( A ) is the proportion of times the coders agree and P ( E ) is the

proportion of times they can be expected to agree by chance. The value

of M is 1 in the case of total agreement and zero in the case of total

disagreement. According to Rietveld and van Hout (1993), M-values

above 0.8 show almost perfect agreement, those between 0.6 and 0.8

show substantial agreement, those between 0.4 and 0.6 moderate

agreement, those between 0.2 and 0.4 fair agreement, and those below

0.2 show slight agreement beyond chance. Generally, a value above

0.6 is considered satisfactory.

However, M can often be very low, whilst percentage agreement is

still quite high, and it has been argued that M may not be a suitable

statistic for this (Cavicchio and Poesio, 2009 for discussion). For

instance, if one of the coders has strong preference for a particular

category, the likelihood of the coders agreeing on that category by

chance is increased and, consequently, the overall agreement measured

Search WWH ::

Custom Search

Home