Graphics Reference
In-Depth Information
required as initial evidence for the network: (1) whether an object can
be decomposed into subparts, (2) whether it has any symmetrical axes,
(3) its main axis, and (4) its position in the world. Further information
drawn upon by the decision network concerns the discourse context. It
is provided by other modules in the overall generation process and can
be accessed directly from the blackboard. All evidence available is then
propagated through the network resulting in a posterior distribution
of probabilities for the values in each chance node. By applying a
winner-takes-all rule to the posterior probability distribution we
decide which value to fill in a feature matrix specifying the gesture
morphology. Alternatively, maximum a-posteriori sampling could be
applied at this stage of decision making to result in non-deterministic
gesture specifications.
4.2 Modeling results
The previously described generation model has been embedded in
a larger production architecture realized using a multi-agent system
toolkit, a natural language sentence planner (SPUD; Stone et al.,
2003), the Hugin toolkit for Bayesian inference (Madsen et al., 2005),
and the ACE realization engine (Kopp and Wachsmuth, 2004). With
this prototype implementation, an embodied agent is enabled to
explain the same virtual reality buildings that were also used in
the SaGA corpus study. Being equipped with proper knowledge
sources, i.e. communicative plans, lexicon, grammar, propositional
and imagistic knowledge about the world, the agent can randomly
pick a landmark and a certain spatial perspective towards it, and then
creates his explanations including speech and gestures autonomously.
By simply switching between the GNetIc networks learned from
different speakers, the embodied agent's gesturing can immediately
be individualized differently. Accordingly, the gesturing behavior for
a particular referent in a respective discourse context varies. Currently
the system has five different individual networks (speakers P1, P5, P7,
P8 and P15 from the SaGA corpus) as well as the average network
learn from the combined data at its disposal.
In Figure 3, examples are given from five different simulations each
of which based on exactly the same initial situation, i.e. all gestures
are referring to the same referent (a round window of a church) and
are generated in exactly the same discourse situation. The resulting
nonverbal behavior varies significantly depending on the decision
network underlying the simulation: For P7, no gesture is produced
at all, whereas for P5 and P8, static posturing gestures are produced
which, however, differ in their low-level morphology. For P5, the
Search WWH ::




Custom Search