Multimodal Fusion in Human-Agent Dialogue - Coverbal Synchrony in Human-Machine Interaction

Graphics Reference

In-Depth Information

While semantic fusion is driven by the need to exploit the

complementarity of modalities, fusion techniques in social signal

processing make less explicit use of modality-specific benefits.

Nevertheless, such an approach might help improve the gain obtained

by current fusion techniques. For example, there is evidence that

arousal is recognized more reliably using acoustic information while

facial expressions yield higher accuracy for valence. In addition,

context information may be exploited to adapt the weights to be

assigned to the single modalities. For example, in a noisy environment

less weight might be given to the audio signal. A first attempt to make

use of the complementarity of modalities has been by Wagner et al.

(2011a). Based on evaluation of training data, experts for every class

of the classification problem are chosen. Then the classes are rank

ordered, beginning with the worst classified class across all classifiers

and ending with the best one.

Figure 1. Different fusion mechanisms: (a) Semantic fusion, (b) feature-level fusion and (c)

decision-level fusion.

Search WWH ::

Custom Search

Home