Graphics Reference
In-Depth Information
is sensitive towards the user's feelings. For example, Martinovsky
and Traum (2003) demonstrated by means of user dialogues with
a training system and a telephone-based information system that
many breakdowns in man-machine communication could be avoided
if the machine was able to recognize the emotional state of the user
and responded to it more sensitively. This observation shows that a
system should not only analyze what the user said or gestured but
also consider more subtle cues, such as psychological user states.
With the departure from pure task-based dialogue to more human-
like dialogues that aim to create social experiences, the concept
of multimodal fusion as originally known in the natural language
community has to be extended. We do not only need fusion mechanisms
that derive information on the user's intention from multiple modalities,
such as speech, pointing gestures and eye gaze. In addition, fusion
techniques are required that help a system assess how the user perceives
the interaction with it. Accordingly, fusion mechanisms are required not
only at the semantic level, but also at the level of social and emotional
signals. With such systems, any interaction may indeed feature a task-
based component mixed with a social interaction component. These
different components may even be conveyed on different modalities
and overlap in time. It is thus necessary to integrate a deeper semantic
analysis in social signal processing on the one side and to consider
social and emotional cues in semantic fusion mechanisms on the other
side. Both streams of information need to be closely coupled during
fusion since they can both include similar communication channels. For
example, a system may fuse verbal and nonverbal signals to come up
with a semantic interpretation, but the same means of expression may
also be integrated by a fusion mechanism as an indicator of cognitive
load (Chen et al., 2012).
By providing a comparative analysis of semantic fusion of
multimodal utterance and fusion of social signals, this chapter aims
to give a comprehensive overview of fusion techniques as components
of dialogue system that aim to emulate qualities of human-like
communication. In the next section, we first present taxonomies for
categorizing fusion techniques focusing on the relationship between
the single modalities and the level of integration. Section 3 addresses
the fusion of semantic information, whereas Section 4 is devoted to
the fusion of social signals. To enable a better comparison of issues
handled in the two areas, both sections follow a similar structure. We
first introduce techniques for fusing information at different levels
of abstraction and discuss attempts to come up with standards to
represent information to be exchanged in fusion engines. After that we
discuss challenges that arise when moving from controlled laboratory
Search WWH ::




Custom Search