Graphics Reference
In-Depth Information
an appropriate feedback signal, such as a frown, to encourage more
information from the user. As a consequence, the fusion process in this
system may extend over a sequence of turns in a multimodal dialogue.
4. Multimodal Interfaces Featuring Fusion
of Social Signals
Recently, the automatic recognition of social and emotional cues
has shifted from a side issue to a major topic in human-computer
interaction. The aim is to enable a very natural form of interaction by
considering not only explicit instructions by human users, but also more
subtle cues, such as psychological user states. A number of approaches
to automated affect recognition have been developed exploiting a
variety of modalities including speech (Vogt and André, 2005), facial
expressions (Sandbach et al., 2012), body postures and gestures
(Kleinsmith et al., 2011) as well as physiological measurements (Kim
and André, 2008). Also, multimodal approaches to improve emotion
recognition accuracy are reported, mostly by exploiting audiovisual
combinations. Results suggest that integrated information from audio
and video leads to improved classification reliability compared to a
single modality—even with fairly simple fusion methods.
In this section, we will discuss applications with virtual humans
and social robots that make use of mechanisms for fusing social and
emotional signals. We will start off by discussing a number of design
decisions that have to be made for the development of such systems.
4.1 Techniques for fusing social signals
Automatic sensing of emotional signals in real-time systems usually
follows a machine learning approach and relies on an extensive set of
labeled multimodal data. Typically, such data are recorded in separate
sessions during which users are asked to show certain actions or
interact with a system that has been manipulated to induce the desired
behavior. Afterward, the collected data is manually labeled by human
annotators with the assumed user emotions. Thus, a huge amount of
labeled data is collected for which classifiers are trained and tested.
An obvious approach to improve the robustness of the classifiers is
the integration of data from multiple channels. Hence, an important
decision to take concerns the level at which the single modalities
should be fused.
A straightforward approach is to simply merge the features
calculated from each modality into one cumulative structure, extract
Search WWH ::




Custom Search