Graphics Reference
In-Depth Information
The link between such verbal and coverbal information is established
over linguistic and prosodic features, such as pauses, vocalization,
and nasalization. In many cases these signals formulate speech-flow
and may also be identified by different sounds (e.g. fillers such as
'hmm...' 'ehhh…'). Further, the coverbal content selection and non-
verbal behavior planning (coverbal alignment), as described in this
chapter, are fused into a common engine in an efficient and flexible
way. The core represents the text-to-speech synthesis engine (TTS) (Rojc
and Ka`´i`´, 2007) that is enriched with additional modules devoted to
non-verbal expressions' synthesis. The realization engine, which then
reproduces the generated verbal and non-verbal behavior (including
co-articulation), is a proprietary modular, hierarchically oriented ECA-
based engine (Mlakar and Rojc, 2011), capable of animating procedural,
key-frame (key-shape)-based animations.
The architecture of the proposed system for generating coverbal
conversational behavior is presented in Figure 1. The language-
dependent resources are separated from the language-independent
engine, by using FSM formalism. All modules allow easy integration
of new algorithms into the common queuing mechanism. This well-
established mechanism also allows for flexible, efficient and easy
integration of additional modules. This is especially useful in our
case, since in addition to those modules used for speech synthesis
(tokenizer, part-of-speech tagger, grapheme-to-phoneme conversion,
symbolic and acoustic prosody, unit selection, concatenation, and
acoustic processing), several modules have to be integrated into
Figure 1. The PLATTOS TTS-driven behavior generation system's engine.
Search WWH ::




Custom Search