From Annotation to Multimodal Behavior - Coverbal Synchrony in Human-Machine Interaction

Graphics Reference

In-Depth Information

the different communication modes into consideration. To define a

multimodal relation, a basic distinction can be made between two

signs being dependent or independent. Dependent signs can be

compatible or incompatible. If the signs are compatible, they must

either complement or reinforce each other, while incompatible signs

express different contents, as for example in irony.

The correspondences between different modality levels can also

be within-speaker or across-speakers, depending on whether the

speech and gestures are produced by the same or different speakers.

The interactive nature of communication may require a separate level

of coding, since joint actions are qualitatively different from those

exhibited by a single agent alone. For instance, cooperation on the

construction of shared knowledge cannot be attached as a feature of

the individual agent's behavior, since it is not an action that the agent

just happens to perform simultaneously with a partner but a genuine

joint activity that the agents produce in coordination.

Multimodal expressions can have different time spans (cf. speech

segments and gesturing). An important issue here is the anchoring

between various modality tracks. This can take place at different

levels: at the phoneme, word, phrase or an utterance level. Most often

in multimodal interaction annotations, the smallest speech segment

is the word.

2.4 Annotation tools

Annotation schemes are based on the theories of communication, and

they are realized in concrete data annotations. Multimodal annotation

includes various independent data streams which are annotated and

linked together. Usually multimodal annotation schemes agree on the

modality levels (input streams) and the general descriptive dimensions

on them, but differ on the number and detailed interpretation

of categories. Such annotations pose technical challenges for the

annotation tools and workbenches, and also for the synchronization of

cross-modality phenomena. An annotation tool must thus be capable

of processing multimodal information and of supporting the fusion

of multimodal input streams (speech and gestures, facial expressions,

etc.).

Manual annotation is notorious for being a time- and resource-

consuming task, and different tools and workbenches have been

developed for helping annotators. The two more common tools for

speech transcription and analysis are Praat (Boersma and Weenink,

2009) and Wafesurfer (Sjölander and Beskow, 2000), whilst in the

Search WWH ::

Custom Search

Home