Graphics Reference
In-Depth Information
rules to predict when a backchannel can occur according to user's
verbal and non-verbal behavior. They concluded that backchannel
signals (like head nods or short verbal responses that invite the speaker
to go on) appear at a pitch variation in speaker's voice; listener's
frowns, body movements and shifts of gaze are produced when the
speaker shows uncertainty. Mimicry behavior is often displayed by
the listener during the interaction; for example, listener mimics shift
postures, gaze shift, head movements and facial expressions.
Later on, Gratch et al. (2007) developed the “Rapport Agent”, an
agent that provides solely non-verbal backchannels when listening.
This agent was implemented to study the level of rapport that users
feel while interacting with a virtual agent capable of providing
backchannel signals. The system analyzes the user's non-verbal
behavior (head nods, head shakes, head movements, mimicry) and
some features of the user's voice to decide when a backchannel must
be displayed. Tests performed with the Rapport Agent showed that
the system can elicit a feeling of rapport in users. In the evaluation,
a participant, the speaker, had to watch a cartoon movie and then tell
the story to another subject, the listener. The speaker and the listener
were separated; the listener could hear the speaker and see her/him
on a screen, while the speaker could see an avatar on a screen (he was
told that the avatar reproduced perfectly the listener's behaviors). The
subjects were randomly assigned to one of the following conditions:
￿ Responsive : the avatar is controlled by the Rapport Agent system.
￿ Unresponsive : the avatar is driven by a script that generates random
backchannel signals; as a consequence its behavior is not related
to the behavior that the speaker is actually displaying.
The results of the test showed that, in responsive condition,
participants spoke more than in the other condition. Moreover,
speakers' speech was more fluent in responsive condition than in
unresponsive condition. In fact, in the latter condition they produced
more disfluencies, that is, filled pauses and stutters.
Recently Morency et al. (2008) proposed an enhancement of this
type of system introducing a machine learning method to find the
speaker's multimodal features that are important and can affect
timing of the agent's backchannel. The system uses a sequential
probabilistic model for learning how to predict and generate real-
time backchannel signals. The model is designed to work with two
sequential probabilistic models: the Hidden Markov Model and the
Conditional Random Field. To train the predicted model, a corpus of
50 human-to-human conversations has been used. From the video and
Search WWH ::




Custom Search