Multimodal Fusion in Human-Agent Dialogue - Coverbal Synchrony in Human-Machine Interaction

Graphics Reference

In-Depth Information

Figure 2. Affective Listener Alfred: the current user state is perceived using SSI, a framework

for social signal interpretation (Wagner et al., 2011b) framework (upper left window);

observed cues are mapped onto the valence and arousal dimensions of a 2D emotion model

(upper middle window); values for arousal and valence are combined to a fi nal decision

and transformed to a set of Facial Animation Coding System (FACS) parameters, which are

visualized by the virtual character Alfred (right window).

(Color image of this fi gure appears in the color plate section at the end of the topic.)

The fusion approach is inspired by that developed for the

Augmented Reality Tree. However, while Gilroy et al. (2011) generate

one vector per modality, Wagner et al. (2011b) generate one vector for

each detected event. This way they prevent sudden leaps in case of

a false detection. Since the strength of a vector decreases with time,

the influence of older events is lessened until the value falls under a

certain threshold and is completely removed.

5. Exploiting Social Signals for Semantic Interpretation

Few systems combine semantic multimodal fusion for task-based

command interpretation and multimodal fusion of social signals. A

few studies nevertheless mention some interaction between the two

communication streams. Such combinations occur in users' behaviors.

For example, a user may say “Thanks” to a virtual agent and at the

same time start a new command using gesture (Martin et al., 2006). In

another study about multimodal behaviors of users when interacting

with a virtual character embedded in a 3D graphical environment,

such concurrent behaviors were also observed. In such cases, speech

input was preferred for social communication with the virtual character

(“how old are you?”), whereas 2D gesture input was used in parallel

Search WWH ::

Custom Search

Home