Graphics Reference
In-Depth Information
(Bavelas and Chovil, 2000; Kendon, 2004), and the cognitive process
of the speaker (Pine et al., 2007; Kita and Davies, 2009). Information
expressed by co-aligned verbal and non-verbal behavior is better
understood and faster in achieving its purpose (e.g. inducing a social
response). Embodied Conversational Agents (ECAs), like in (Kopp and
Wachsmuth, 2002; Mlakar and Rojc, 2011; Poggi et al., 2005; Thiebaux
et al., 2008), present a paradigm of artificial bodies that can control
and move different body-parts, and are capable of communicating
by using their voices, faces, hands, and arms (or full body). ECAs
may represent the coverbal behavior in the form of communicative
function and/or by directly representing the semiotic nature of the
spoken dialog (e.g. as iconic/metamorphic representation, stressing the
importance of spoken segments, or simply as to regulate the flow of
information exchange). The believability of synthetic coverbal behavior
closely relates to the term expressivity, namely, to the ability of
performing continuous and smooth, context adaptable communicative
acts, emulating natural movement tendencies and dynamics, and
synchrony with the situational context and/or the verbal flow. The
interaction incorporating expressive ECAs is proven to provide
visual meaning, and benefit an understanding of the spoken words
and actions performed in multimodal interfaces. Although ECAs and
synthetic 'communicative' behavior have been well researched, the
co-alignment of speech and non-verbal expressions still represents an
important and challenging task. Most of the behavior overlaid by such
agents is, therefore, often limited to lip-sync (Tang et al., 2008; Zoric
and Pandži', 2008), and facial expressions (Lankes and Bernhaupt,
2011), or is based on behavior generation/realization engines that
incorporate scenarios and/or semantically tagged text ( erekovi´
and Pandži', 2011; Krenn et al., 2011; Nowina-Krowicki et al., 2011;
van Oijen et al., 2012). In general, the correlation between verbal
and non-verbal signals originates from the semantic, pragmatic and
temporal features of the multimodal-content (Jokinen, 2009; Kendon,
2000; McNeill, 1992). Some coverbal gestures like iconic expressions
(Hadar and Krauss, 1999; Straube et al., 2011), symbolic expressions
(Barbieri et al., 2009), and mimicry (Holler and Wilkin, 2011) are also
tightly interlinked with speech. These gestures may be identified
by the linguistic (semantic) properties of the input text, like word-
type, word-type-order, word-affiliation, etc. Other coverbal gestures,
especially those representing communicative functions (e.g. indexical
and adaptive expressions, Allwood, 2001), have less (if at all) evident
semantic or linguistic alignment with the text. However, they may
still be identified by linguistic fillers (Grenfell, 2011), turn-taking,
and directional signals. Although speech and coverbal gestures are
Search WWH ::




Custom Search