Graphics Reference
In-Depth Information
human's skeleton. However, one of the diffi culties of using this method
is that the audio of the utterance in combination with the associated
facial movements must be synchronized with the body animation. In
other words, the body movement, facial movement and audio track must
be synchronized together exactly as they were captured. This means
capturing the facial animation performance at the same time as the body
movement (Stone et al., 2004), which typically requires three different
capture systems: a motion capture system for the body, a separate one
for the face, and a third audio capture for the utterance. In order to
simplify this process, the facial performance can be captured during a
separate performance, but doing so risks a lack of synchronization with
the original body performance. The best level of quality is achieved
through capture and playback of the performance through these three
capture systems. However, each system requires different processes in
order to obtain data that can be reused on a virtual character. Thus,
it takes a large amount of effort to integrate synchronized data from
three separate systems together. In addition, the high level of quality
comes at the expense of specifi city; the performance is only meaningful
in contexts similar to those during the recorded session. For example,
consider an actor whose performance in a dialogue is captured and
synthesized onto a virtual character. By only capturing one of the two
actors and synthesizing that performance onto a virtual human, you
risk misapplying the subtleties of the actor during the performance
that are in response to the presence or movements of the other actor.
A recording interaction might be subtlety different when used in a
different conversation with a different person. Conversational energy,
timing, backchannelling and even gaze can be different with different
partners. Thus, the high level of quality achieved by replaying an actor's
performance can be limited in its use outside of the original context.
2.3 Hub-and-spoke architectures
As an alternative to reusing a motion captured performance directly,
a hub-and-spoke architecture can be used to achieve greater reuse
of speech, gestures and facial performance. This method uses an
underlying base, or idle pose, and blends gesture motions that start
and end in a similar position as the idle pose (Shapiro, 2011). For
example, an actor will perform a number of gestures starting from the
base pose, performing the gesture, and then returning to the base pose.
Thus, each gesture can be replayed on a virtual human in different
order with other gestures, each starting and ending from the same base
pose, which is usually implemented as a continuous idle posture. This
method allows you to synthesize an arbitrary sequence of gestures,
Search WWH ::




Custom Search