From Annotation to Multimodal Behavior - Coverbal Synchrony in Human-Machine Interaction

Graphics Reference

In-Depth Information

because it does not seem to have an obvious communicative function,

but it may also be a signal of the agent being bored or having nothing

to say, in which case it carries communicative meaning. Annotations

thus differ depending on whether the gestures are interpreted as

being intentionally communicative by the communicator (displayed or

signaled) (Allwood, 2001), or the gestures are judged (by the annotator)

to have a noticeable effect on the recipient.

Since emerging technology allows recognition of gestures and

faces via cameras and sensors, it is possible to extract gestures and

face expressions from the data. The form-features of gestures can

then be automatically linked to appropriate communicative functions.

Combining the top-down approach, i.e. manual annotation and

analysis of the data, with the bottom-up analysis of the multimodal

signals, we can visualize the speakers' communicative behavior, and

also show how synchrony of conversation is constructed through

the interlocutors' activity (Jokinen and Scherer, 2012). The top-down

approach uses human observation, e.g. takes video recordings which are

manually tagged according to some scheme, to mark communicatively

important events. The bottom-up approach, on the other hand, uses

automatic technological means to recognize, cluster, and interpret the

signals that the communicating agents emit. These two approaches

look at the communicative situations from two opposite viewpoints:

they use different methods and techniques, but the object of study is

the same. Communication models can thus be built and incorporated

into smart applications through top-down human observations and

bottom-up automatic analysis of the interaction, and the approach is

beneficial for both interaction technology and human communication

studies. New algorithms and models will be required for the detection

and processing of speech information along with gestural and facial

expressions, and existing technologies will need to be adapted to

accommodate these advances. Simulations based on manual analysis

of corpora on gestures and face expressions are already incorporated

within the development of Embodied Conversational Agents (e.g.,

André and Pelachaud, 2010), and a motion capture tool to gather

more precise data about the user's behavior is described in Csapo et

al. (2012).

2.3 Interdependence: Modal and multimodal annotation

Two kinds of annotation of interaction data can be considered. The

first is uni-modal annotation that is specific to a particular modality,

e.g. dialogue act annotation or gesture annotation, and the second one

is multimodal annotation proper, which takes the relation between

Search WWH ::

Custom Search

Home