Graphics Reference
In-Depth Information
for task command (e.g. to get some information about one of the
graphical object displayed in the environment) (Martin et al., 2006).
An example of a system managing these two streams includes
the SmartKom system, which features adaptive confidence measures.
While the user is speaking (possibly for task commands), the confidence
value of the mouth area recognizer is decreased for the module that
detected emotions expressed in user's facial expression (Wahlster,
2003). The SmartKom system thus uses a mixture of early fusion for
analyzing emotions from facial expressions and speech and late fusion
for analyzing the semantics of utterances.
Rich et al. (2010) presented a model of engagement for human-
robot interaction that took into account direct gaze, mutual gaze,
relevant next contribution and back channel behaviors as an indicator
of engagement in a dialogue. Interestingly, the approach was used
for modeling the behavior of both the robot and the human. As a
consequence, it was able to explain failures in communication from
the perspective of both interlocutors. Their model demonstrates the
close interaction between the communication streams required for
semantic processing and social signal processing because it integrates
multimodal grounding with techniques for measuring experiential
qualities of a dialogue. If communication partners fail to establish a
common understanding of what a dialogue is about, it is very likely
that they will lose interest in continuing the interaction.
Bosma and André (2004) presented an approach to the joint
interpretation of emotional input and natural language utterances.
Especially short utterances tend to be highly ambiguous when solely
the linguistic data is considered. An utterance like “right” may be
interpreted as a confirmation as well as a rejection, if intended cynically,
and so may the absence of an utterance. To integrate the meanings of
the users' spoken input and their emotional state, Bosma and André
combined a Bayesian network to recognize the user's emotional state
from physiological data, such as heart rate, with weighted finite-state
machines to recognize dialogue acts from the user's speech. The finite-
state machine approach was similar to that presented by Bangalore and
Johnson (2009). However, while Bangalore and Johnston used finite-
state machines to analyze the propositional content of dialogue acts,
Bosma and André focused on the speaker's intentions. Their objective
was to discriminate a proposal from a directive, an acceptance from
a rejection, etc., as opposed to Bangalore and Johnston who aimed at
parsing user commands that are distributed over multiple modalities,
each of the modalities conveying partial information. That is, Bosma
and André did not expect the physiological modality to contribute to
the propositional interpretation of an utterance. Instead, the emotional
Search WWH ::




Custom Search