Modeling Human Communication Dynamics for Virtual Human - Coverbal Synchrony in Human-Machine Interaction

Graphics Reference

In-Depth Information

language technology that provides incremental interpretation of

partial utterances such as the work of Devault et al. (2011) which

provides a semantic interpretation, a measure of confidence in the

current understanding and a measure of whether continued listening

will lead to better understanding. The virtual human's reaction to the

understanding is a valenced reaction to the evolving interpretation of

the speaker's utterance. For example, if the virtual human interprets

the speaker's partial utterance as deliberately proposing an action to

harm the virtual human, then the reaction will be anger.

The model analyzes this information and triggers relevant listener

feedback rules, which are mapped to appropriate nonverbal behaviors,

such as nods for generic feedback and expressions of confusion,

comprehension, happiness or anger for the specific feedback. These

behaviors are also conditional on the listener's roles and goals. In

particular, a listener that is the main addressee and has the goals

of participating in and understanding the conversation will engage

in mutual gaze with the speaker, nod to signal attention and signal

comprehension and reaction to the content of the utterance. On the

other hand, an eavesdropper that has the goal of avoiding participation

in the conversation will avoid mutual gaze and signaling attention

with nods.

4. Interpersonal Dynamic: Speaker and

Listener Interaction

A great example of interpersonal dynamics is backchannel feedback,

the nods and para-verbals such as “uh-huh” and “mm-hmm” that

listeners produce as someone is speaking (Watzlawick et al., 1967).

They can express a certain degree of connection between listener

and speaker (e.g., rapport), a way to show acknowledgement (e.g.,

grounding) or they can also be used for signifying agreement.

Backchannel feedback is an essential and predictable aspect of natural

conversation and its absence can significantly disrupt participant's

ability to communicate (Bavelas et al., 2000). Accurately recognizing

the backchannel feedback from one individual is challenging since

these conversational cues are subtle and vary between people. Learning

how to predict backchannel feedback is a key research problem for

building immersive virtual human and robots. Finally, there are still

some unanswered questions in linguistic, psychology and sociology on

what triggers backchannel feedback and how it differs from different

cultures. In this chapter we show the importance of modeling both

the multimodal and interpersonal dynamics of backchannel feedback

for recognition, prediction and analysis.

Search WWH ::

Custom Search

Home