Modeling Human Communication Dynamics for Virtual Human - Coverbal Synchrony in Human-Machine Interaction

Graphics Reference

In-Depth Information

human's skeleton. However, one of the diffi culties of using this method

is that the audio of the utterance in combination with the associated

facial movements must be synchronized with the body animation. In

other words, the body movement, facial movement and audio track must

be synchronized together exactly as they were captured. This means

capturing the facial animation performance at the same time as the body

movement (Stone et al., 2004), which typically requires three different

capture systems: a motion capture system for the body, a separate one

for the face, and a third audio capture for the utterance. In order to

simplify this process, the facial performance can be captured during a

separate performance, but doing so risks a lack of synchronization with

the original body performance. The best level of quality is achieved

through capture and playback of the performance through these three

capture systems. However, each system requires different processes in

order to obtain data that can be reused on a virtual character. Thus,

it takes a large amount of effort to integrate synchronized data from

three separate systems together. In addition, the high level of quality

comes at the expense of specifi city; the performance is only meaningful

in contexts similar to those during the recorded session. For example,

consider an actor whose performance in a dialogue is captured and

synthesized onto a virtual character. By only capturing one of the two

actors and synthesizing that performance onto a virtual human, you

risk misapplying the subtleties of the actor during the performance

that are in response to the presence or movements of the other actor.

A recording interaction might be subtlety different when used in a

different conversation with a different person. Conversational energy,

timing, backchannelling and even gaze can be different with different

partners. Thus, the high level of quality achieved by replaying an actor's

performance can be limited in its use outside of the original context.

2.3 Hub-and-spoke architectures

As an alternative to reusing a motion captured performance directly,

a hub-and-spoke architecture can be used to achieve greater reuse

of speech, gestures and facial performance. This method uses an

underlying base, or idle pose, and blends gesture motions that start

and end in a similar position as the idle pose (Shapiro, 2011). For

example, an actor will perform a number of gestures starting from the

base pose, performing the gesture, and then returning to the base pose.

Thus, each gesture can be replayed on a virtual human in different

order with other gestures, each starting and ending from the same base

pose, which is usually implemented as a continuous idle posture. This

method allows you to synthesize an arbitrary sequence of gestures,

Search WWH ::

Custom Search

Home