Graphics Reference
In-Depth Information
after the transition between sequential shapes is completed, the
overlaid shape (excited state) has to be maintained. It is equivalent
to the PI's persistence attribute. The non-verbal generator deque also
forms the needed description of the facial expressions required for
the lip-sync process. In this case, the visual sequence of visemes/
phonemes is described in the form of 'viseme' EVA-SCRIPT tags,
encapsulated within the 'speech' tag. The needed information is stored
in Segment units of the Segment relation layer. For example, each
segment (including sil ) represents one 'viseme' tag within lip-sync
specification and, at the same time, a gestural affiliate for a facial
expression that within the mouth region overlays the pronunciation
of the segment. The duration of each segment is also available. It is
equivalent to the duration attribute of the EVA-SCRIPT's 'viseme' tag.
The proper transition between segments is handled internally by the
EVA animation engine, as proposed in Rojc and Mlakar (2011).
6. Evaluation
In order to test the algorithms of the proposed TTS-driven behavioral
generation system, and to evaluate the quality and naturalness of the
generated output, we annotated over 35 minutes of the proprietary
video corpus, and created the gestural dictionary containing 300
distinct conversational configurations of arms and hands, described
in the form of EVA-SCRIPT shape models. The shape models varied in
structure (base represented shape), and intensity of the shape. These
shape configurations are (based on the annotation and literature)
manually linked to the verbal information (words and phrases) and can
be automatically selected and temporally adjusted by the behavioral
generation process. For the reliability of the resources, we rely on the
findings presented in literature (e.g. Kita et al., 1998; Loehr, 2004);
however, we have also ensured that meaningful words (manually
identified meaningful words, serving only for the purpose of semiotic
grammar evaluation) in the experimental sequences had at least one
representative affiliate stored within the gesture dictionary that can
be accessed based on either semiotic or implicit rules. Further, we re-
created and animated several text-sequences. For the following text
sequence: 'Vedno iste, in samo iste obraze' ( Always the same and only the
same faces ), the TTS-driven system first runs the phase-tagging-deque . The
HRG structure at this level already contains morphological information
about the input text sequence, stored within the Word relation layer:
'Vedno/R iste/P- ,/XCOMMA in/C samo/Q iste/P- obraze/N ./
XPERIOD'. Based on word-type order (semiotic patterns), the system
is then able to identify the following two meaningful phrases:
Search WWH ::




Custom Search