Graphics Reference
In-Depth Information
3.2 Multimodal signals
As described in Section 2.1.1, backchannels are provided not only
through the visual modality, but also through voice by uttering
paraverbals, words or short sentences (Gardner, 1998; Allwood et
al., 1993). For such a reason to create credible virtual agents, this
type of signals must be taken into account. Bevacqua et al. (2010)
proposed to improve user-agent interaction by introducing multimodal
signals in the backchannels performed by the ECA Greta. Moreover,
they presented a perceptual study with the aim of getting a better
understanding about how multimodal backchannels are interpreted
by users. Like in their previous studies (Bevacqua et al., 2007; Heylen
et al., 2007), video clips of a virtual agent performing context-free
multimodal backchannel signals were shown. The participants were
asked to assign none, one or several meanings to each signal. Again,
the meanings proposed were: agreement, disagreement, acceptance,
refusal, interest, not interest, belief, disbelief, understanding, not
understanding, liking, not liking.
To create videos, seven visual cues (raise eyebrows, nod, smile,
frown, raise left eyebrow, shake and tilt+frown) and eight acoustic
cues (seven vocalizations plus silence: ok, ooh, gosh, really, yeah, no,
m-mh and (silence)) were selected. The visual cues were chosen among
those studied in previous evaluations (Bevacqua et al., 2007; Heylen et
al., 2007), whereas the vocalizations were selected using an informal
listening test (Bevacqua et al. (2010) for more details). The authors
hypothesized that (i) a multimodal signal created by the combination
of visual and acoustic cues representative of a meaning would obtain
the strongest attribution of the given meaning; (ii) sometimes the
meaning conveyed by each acoustic and visual cues is different by
the meaning transmitted by their combination; and (iii) multimodal
signals obtained by the combination of visual and acoustic cues that
have strongly opposite meanings are rated as nonsense (as for instance
nod+no , shake+ok , shake+yeah ).
The evaluation was performed in English and 55 participants accessed
anonymously to it through a web browser where the multimodal signals
were played one at a time. Participants used a bipolar 7-points Likert
scale for each meaning: from -3 (extremely negative attribution) to +3
(extremely positive attribution). Assigning 0 to all dimensions meant
that participants could not find a meaning among those proposed for the
given signal. They could also judge the signal as completely nonsense.
The 95% confidence interval was calculated for all the meanings.
Table 2 reports all signals for which the mean was significantly above
zero (for positive meanings) or below zero (for negative meanings). For
Search WWH ::




Custom Search