Graphics Reference
In-Depth Information
the structure (for synthesis of non-verbal expressions, symbolically
and temporarily aligned with speech). The following modules are
proposed for adding to the core TTS engine: phase-tagger, inner-
fluidity, temporal-sync, and non-verbal generator. The phase-tagger is
needed for the symbolical synchronization of verbal and non-verbal
behavior (meaning identification and selection). The inner-fluidity
module is needed for specifying the inner-fluidity and dynamics of
the meaningful coverbal gesture sequence. Then the temporal-sync
module is needed in order to temporally co-align the propagation of
coverbal gestures with the generated verbal content's pronunciation
and prosodic features (Kröger et al., 2010). Finally, the non-verbal
generator module is used for transforming the generated 'behavior
plan' into abstract behavior specification that can be animated by an
ECA within an animation engine. In this way, the system is completely
TTS-driven. It, therefore, benefits from the core TTS system and its
underlying extracted and the predicted linguistic and prosodic features,
generally used for the generation of speech signal from text (e.g. stress,
prominence, phrase breaks, segments' durations, pauses, etc.). Further,
in order for the system's outputs to be fully synchronized and used
by different virtual/physical interfaces at interactive speeds, it is most
efficient when the TTS-driven approach synthesizes coverbal gestures
and speech signals, simultaneously. In the PLATTOS system, this is
achieved by fusing the coverbal gesture synthesis stream with the
engine's verbal expressions synthesis stream. The verbal expressions
synthesis stream is responsible for determining linguistic and several
prosodic features on general input text, and to synthesize the speech
signal. The coverbal gesture synthesis stream is then responsible for
identifying the meaning, selecting visual content and type of the
presentation, co-aligning it with the generated speech signal, and for
transforming the coverbal behavior (sequence of gestures) into a form
that is understandable by synthetic ECAs.
All modules within the system are based on three flexible and
efficient data structures: deques that are used for the flexible linking
of several of the engine's processing steps, heterogeneous relation
graphs that are used for storing extracted and predicted linguistic
and prosodic data on general input text, and finite-state machines
that are used for separating the language-dependent resources from
the system (Rojc and Ka`´i`´, 2007).
4. Grammar
The lexical affiliation and gesture dictionaries (Gesticons) have been
described by Schegloff (1985) as: “word of words deemed to correspond
Search WWH ::




Custom Search