Graphics Reference
In-Depth Information
scale. Besides information about the user's voice intonation (obtained
through an automatic intonation analysis), the system collects, by a
human observer in a Wizard of Oz setting, data about the user's hand
gestures and gaze direction. Such data is used to automatically control
the virtual agent's gaze and its turn-taking behavior.
Another backchannel system has been implemented by Cassell
and colleagues for REA, the Real Estate Agent (Cassell and Thórisson,
1999). REA is a virtual humanoid whose task consists in showing users
the characteristics of houses displayed behind her. She interacts with
users through verbal and non-verbal behaviors; REA's text to speech
synthesizer allows her to vary the intonation of her voice. Like Gandalf,
REA's responses are generated on the basis of a pause duration model.
She provides a backchannel signal at each pause that lasts more than
500 ms (Cassell and Thórisson, 1999). Still like Gandalf, REA can emit
backchannels as short utterances and head nods, but in addition she
can show puzzlement (for example by raising its eyebrows) asking for
repair when she does not understand what the speaker says.
Cathcart et al. (2003) proposed a model based on pause duration.
They presented a shallow model that uses human dialogue data for
predicting where backchannel signals should appear. This dialogue
data was extracted from the HCRC Map Task Corpus, a set of 128
task-oriented dialogues between English speakers. From the analysis of
this corpus, Carthcart and colleagues found out that backchannel can
be expected at phrase boundaries and that these boundaries occurred
every five to fifteen syllables (Knowles et al., 1996). On the basis of
this result, the system inserted a backchannel every n words (they
approximated syllables by words), where n was determined by the
frequency of backchannels that occurred in the data. They evaluated
their model in three different situations: (1) the model simply inserted a
backchannel signal every n words; (2) the model provided backchannel
only when a pause, longer than a certain duration, was detected; (3) the
model integrated both methods. Results showed that the integration
of the two approaches increased noticeably the accuracy of predicting
a backchannel.
4.3 Multimodal cues-based models
Several studies have shown that speaker's non-verbal behavior can be
helpful in defining when a backchannel signal could be provided by
the listener during a conversation. In certain conditions, for example
when the conversation is progressing smoothly and successfully,
people tend to synchronize themselves like in a dance. Based on this
fact, Maatman et al. (2005) derived from the literature a list of useful
Search WWH ::




Custom Search