Graphics Reference
In-Depth Information
the different communication modes into consideration. To define a
multimodal relation, a basic distinction can be made between two
signs being dependent or independent. Dependent signs can be
compatible or incompatible. If the signs are compatible, they must
either complement or reinforce each other, while incompatible signs
express different contents, as for example in irony.
The correspondences between different modality levels can also
be within-speaker or across-speakers, depending on whether the
speech and gestures are produced by the same or different speakers.
The interactive nature of communication may require a separate level
of coding, since joint actions are qualitatively different from those
exhibited by a single agent alone. For instance, cooperation on the
construction of shared knowledge cannot be attached as a feature of
the individual agent's behavior, since it is not an action that the agent
just happens to perform simultaneously with a partner but a genuine
joint activity that the agents produce in coordination.
Multimodal expressions can have different time spans (cf. speech
segments and gesturing). An important issue here is the anchoring
between various modality tracks. This can take place at different
levels: at the phoneme, word, phrase or an utterance level. Most often
in multimodal interaction annotations, the smallest speech segment
is the word.
2.4 Annotation tools
Annotation schemes are based on the theories of communication, and
they are realized in concrete data annotations. Multimodal annotation
includes various independent data streams which are annotated and
linked together. Usually multimodal annotation schemes agree on the
modality levels (input streams) and the general descriptive dimensions
on them, but differ on the number and detailed interpretation
of categories. Such annotations pose technical challenges for the
annotation tools and workbenches, and also for the synchronization of
cross-modality phenomena. An annotation tool must thus be capable
of processing multimodal information and of supporting the fusion
of multimodal input streams (speech and gestures, facial expressions,
etc.).
Manual annotation is notorious for being a time- and resource-
consuming task, and different tools and workbenches have been
developed for helping annotators. The two more common tools for
speech transcription and analysis are Praat (Boersma and Weenink,
2009) and Wafesurfer (Sjölander and Beskow, 2000), whilst in the
Search WWH ::




Custom Search