Graphics Reference
In-Depth Information
audio recordings they extracted several prosodic features like pitch
variations, vowel volume, energy speech variation; lexical features,
as individual words, incomplete words, emphasized words and so
on; speaker's features, like gaze behavior; listener's features, as
backchannel emission. Speaker's features taken into account can have
an individual influence (a single feature is involved in the triggering
of a backchannel, like a long pause) or a joint influence (more than
one feature influence the listener's backchannel, like a short pause
and a look at the listener). The model takes as input the speaker's
multimodal features and returns a sequence of probabilities of listener
backchannel. In such a sequence, the peak probabilities crossing a
fixed threshold are selected as good opportunities to provide the
backchannel. Morency and colleagues also approached the problem of
agent's expressiveness. By varying the threshold they obtained virtual
agents with different level of expressiveness, that is, that provides
more or less backchannels during an interaction.
Within the European Project SEMAINE 2 , autonomous talking
agents have been created. These agents, called Sensitive Artificial
Listening (SAL) agents, operate as chatbots inviting the human
interlocutor to chat and to bring him or her in a particular mood.
Despite having limited verbal skills, the SAL agents are designed
to sustain realistic interaction with human users (an example of
interaction is shown in Figure 1). A particular concern of the project
was to have the agents produce appropriate listening behaviors. The
listener module integrated in the SEMAINE architecture (Schröder et
al., 2011) has been proposed by Bevacqua et al. (2012). This module,
called Listener Intent Planner (LIP), decides when the agent must
provide a backchannel and which signal must be displayed. A set of
probabilistic rules based on the literature (for example Bertrand et al.
(2006); Ward and Tsukahara (2000)) is used to produce a backchannel
signal when certain speaker's visual and/or acoustic behaviors are
recognized. To identify those user's behaviors, she/he is continuously
tracked through a video camera and a microphone.
When a user's behavior satisfies one of the rules, a backchannel
is triggered. The LIP differs from previous listener models mostly in
the type of backchannels that it can generate. Earlier models have
mainly considered reactive backchannels, an automatic behavior
used to show contact and perception, whereas the LIP also generates
responsive backchannels, that is, attitudinal signals are able to transmit
the agent's communicative functions. Responsive signals are used to
show, for example, that the agent agrees or disagrees with the user, or
2 http://www.semaine-project.eu/
 
Search WWH ::




Custom Search