Graphics Reference
In-Depth Information
performed over the whole word 'samo' ( only ), since the ending of the
PA-syllable also identifies the ending of the word. The NA-syllables are
used to identify the duration of the preparation/retraction movement
phases. The interval of a retraction/preparation phase is determined
by the beginning of the retraction/preparation word's (word phrase)
NA-syllable, and the word's (word phrases) ending. The retraction/
preparation phases of the sequence are the words 'vedno' ( always ) and
'in' ( and ), and the utterance 'raze'. At the end of each phase, the ECA
will overlay the designated articulated configuration (e.g. pre-stroke
shape, or neutral shape). The temporal-sync deque then temporarily
aligns the movement phases in regard to the pronunciation rate of
the phonemes/visemes, and silences. This is implemented by using
the proposed equations (1-6), and predicted temporal information,
as stored in the Segments layer. Figure 11 represents the absolute
durations, and corresponding relative temporal positioning (in regard
to the beginning of the text sequence). Therefore, the duration of the
preparation phase of CU-1 is determined to be 0.428 s. It is followed
by a stroke that is determined to last 0.287 s. Finally, the stroke shape
of CU-1 has to be maintained for 0.224 s. The execution of CU-2 has to
be withheld for the duration of CU-1, and will start after 0.939 s. The
CU-2 will then begin with a preparation (0.460 s), and a pre-stroke-
hold (0.080 s). This sequence represents the PI-3 unit, and the shape
overlaid is described as right_B1, Fr|Ce|No|O. In a similar way, the
temporal information is added to all other PI and CU units.
In order to re-create the non-verbal behavior as generated by
the system and presented in Figure 11 (conversational shapes), the
system at the end generates EVA-SCRIPT-based behavior description,
containing information about lip-sync and gesture. This final step
is performed by the non-verbal generator deque . The output is then
synthesized by the conversational agent EVA, as symbolically and
temporally co-aligned communicative behavior. The synthetic coverbal
gestures were also evaluated by staff members and students. They
evaluated lip-sync, the symbolic representations of meaningful words,
and the alignment of coverbal gestures with the synthesized speech.
All of the evaluators agreed that the speech and visual pronunciation
was in temporal sync; however, 35% of them suggested improvement
of the correlation between visual and audio stressing, 55% of the
observed sequences adequately represented the verbal content, and
30% of the sequences were observed as a meaningful word mismatch.
Based on verbal information, the evaluators expected another word to
be represented. However, when the meaningful word was suggested to
them, most of them agreed that the representation was adequate, and
appeared more natural. Finally, 15% of the sequences were evaluated
Search WWH ::




Custom Search