Graphics Reference
In-Depth Information
automatically generate the animation of two agents while interacting
in the same graphic window. The system computes the behavior of the
agent who is playing the listener inserting a non-verbal backchannel
signal (like a head nod, a raise of the eyebrows or a smile) according
to some probabilistic rules in correspondence with pauses, a particular
pitch accent and so on.
Ward and Tsukahara (2000) provided evidences for the assumption
that often backchannel signals are provided when the speaker is able to
perceive it more easily. They proposed a model based on acoustic cues
to determine the right moment to provide a backchannel. Backchannel
signals are provided when the speaker talked with a low pitch lasting
110 ms after 700 ms of speech and provided that backchannel has not
been displayed within the preceding 800 ms. To evaluate their system,
they tested it on a corpus of pre-recorded dyad conversations. Results
showed that the system (based on the low pitch rule) predicted the
occurrence of backchannel signals better than random: the accuracy
was 18% versus 13% for English and 34% versus 24% for Japanese.
Fujie et al. (2004) presented the humanoid robot ROBISUKE, a
conversational robot able to provide appropriate feedbacks to the user
before the end of an utterance. The robot uses the spoken dialogue
system developed by Nakano et al. (1999) to determine the content of
the feedback according to the content of the user's speech. The system
employs a network of finite state transducers to link recognized words
to content. Then, to determine the right timing of the feedback, it
extracts prosody information.
Thórisson (1997) developed a virtual agent, called Gandalf, capable
of interacting with users using verbal and non-verbal signals. Gandalf
is provided with a face and a hand. It has knowledge about the solar
system. Its interaction with users consists in providing information
about the universe. The solar system is displayed in the same screen
where Gandalf stands and the agent can travel from planet to planet
telling user's facts about each one. During the interaction with the
user, Gandalf is able not only to display facial expressions, attentional
cues (like gazing at the user or at the object it is talking about) and
appropriate behaviors for managing turn-taking, but it is also capable
of producing real-time backchannel signals. To generate backchannels,
the system evaluates the duration of the pauses in the speaker's speech.
A backchannel (a short utterance or a head nod) is displayed when a
pause, longer than 110 ms, is detected. Gandalf is based on a multi-
layer multimodal architecture that endows the agent with multimodal
perception and action generation skills. Each layer has sufficient
information to decide which action to perform at a specific time
Search WWH ::




Custom Search