Modeling Human Communication Dynamics for Virtual Human - Coverbal Synchrony in Human-Machine Interaction

Graphics Reference

In-Depth Information

4.5 Wisdom of crowds

In many real-life scenarios, it is hard to collect the actual labels for

training, because it is expensive or the labeling is subjective. To address

this issue, a new direction of research appeared in the last decade,

taking full advantage of the “wisdom of crowds” (Smith et al., 2005).

In simple words, wisdom of crowds enables the fast acquisition of

opinions from multiple annotators/experts.

Based on this intuition, wisdom of crowds was modeled using

Parasocial Consensus Sampling paradigm (Huang et al., 2010) for

data acquisition, which allows multiple crowd members to experience

the same situation. Parasocial Consensus Sampling (PCS) paradigm

is based on the theory that people behave similarly when interacting

through a media (e.g., video conference).

The goals of the computational model are to automatically discover

the prototypical patterns of backchannel feedback and learn the

dynamic between these patterns. This will allow the computational

model to accurately predict the responses of a new listener even if he/

she changes her backchannel patterns in the middle of the interaction.

It will also improve generalization by allowing mixtures of these

prototypical patterns.

To achieve these goals, a variant of the Latent Mixture of

Discriminative Experts (Ozkan et al., 2010) was proposed to take full

advantage of the wisdom of crowds. The Wisdom-LMDE model is

based on a two-step process: a Conditional Random Field (CRF) is

learned first for each expert, and the outputs of these models are used

as an input to a Latent Dynamic Conditional Random Field (LDCRF,

Figure 6) model, which is capable of learning the hidden structure

within the input. In the Wisdom-LMDE, each expert corresponds to

a different listener from the wisdom of crowds. Figure 8 shows an

overview of the approach.

Table 1 summarizes the experiments comparing the Wisdom-LMDE

model with state-of-the-art approaches for behavior prediction. The

Wisdom-LMDE model achieves the best f-1 score. The second best f-1

score is achieved by CRF Mixture of experts, which is the only model

among other baseline models that combines the different listener labels

in a late fusion manner. This result supports the claim that wisdom

of clouds improves learning of prediction models.

5. Discussion

Modeling human communication dynamics enables the computational

study of different aspects of human behaviors. While a backchannel

Search WWH ::

Custom Search

Home