A Survey of Listener Behavior and Listener Models for Embodied Conversational Agents - Coverbal Synchrony in Human-Machine Interaction

Graphics Reference

In-Depth Information

significantly associated to interest. The highest meaning of interest

was equally attributed to smile+ok , nod+ok , nod+ooh , smile+ooh . The

highest acceptance was attributed to the nod , nod+yeah , nod+ok , nod+ooh ,

nod+m-mh , smile+ok , tilt+frown+ok which were not differentiated in

terms of degree of attribution. Nod+ok was, however, rated as showing

more acceptance than nod+really and nod+no . The highest meaning

of disbelief was attributed to shake+yeah , raise left eyebrow+no , raise

left eyebrow+really , and tilt+frown+no . No difference was observed

between these. Highest attribution of understanding was observed

for raise eyebrows+ooh , nod+ooh , nod+really , nod+yeah and nod . Raise

eyebrows+ooh was not more strongly judged as showing agreement

than the other signals. A significant difference was even found between

nod-ooh and raise eyebrows+ooh : nod-ooh was more strongly associated

to the understanding than raise eyebrows+ooh . In conclusion the first

hypothesis was only partially satisfied.

Results showed that the strongest attribution for a meaning is

not always conveyed by the multimodal signals obtained by the

combination of visual and acoustic cues representative of the given

meaning. For example, disagreement is not more strongly conveyed by

the multimodal signals shake+no , as we hypothesized. Other signals,

like shake and shake+m-mh , convey this meaning as well. That means

that the meaning conveyed by a multimodal backchannel cannot be

simply inferred by the meaning of each visual and acoustic cue that

composes it. It must be considered and studied as a whole to determine

the meaning it transmits when displayed by virtual agents. More

results that go in the direction of such a conclusion were obtained.

The authors found that some multimodal signals convey a meaning

different from the ones associated to the particular visual and acoustic

cues when presented on their own. For example, a high meaning of no

interest was attributed to frown+ooh , although in our previous studies

(Heylen, 2007) the signal frown was associated mainly to disbelief and,

in our preliminary and informal listening test, the vocalization ooh was

associated to understanding.

As regard to the third hypothesis, the evaluation showed that

multimodal signals composed by visual and acoustic cues that have

strongly opposite meanings are rated as nonsense. Four multimodal

signals were significantly rated as nonsense: nod+no , shake+yeah ,

shake+ok and shake+really . What is more, it is interesting to notice

that a high attribution of nonsense does not necessarily exclude the

attribution of other meanings. Thus, the high nonsense signal of

shake+yeah was also highly judged as showing disbelief. A possible

explanation would be that these signals might be particularly context

dependent.

Coverbal Synchrony in Human-Machine Interaction

Search WWH ::

Custom Search

Home