systematic approach could be to customize the face shapes from real data
using machine learning techniques.
The development of “Baldi” is not based on an open framework. That is,
the components, such as spatial deformation model and temporal deforma-
tion model, are created and tuned exclusively for “Baldi”. Therefore, it
is difficult for other researchers to incorporate components of “Baldi” into
other animation systems. It would be highly desirable that the develop-
ment of “Baldi” is formulated as more general methodology such that other
researchers could re-use and refine its components in the future.
2.3.2 Our ongoing and future work
Compared to “Baldi”, one of the goals of our research is to provide a general,
unified framework to guide the development of face motion modeling, analysis
and synthesis. It could result in compact and efficient animation tools, which
can be used by users with various backgrounds (e.g. psychologists) to create
animation suitable for their applications. On the other hand, we make use
of feedback from those applications to devise general principles to guide the
refinement of the synthesis.
The current target application for evaluating our face synthesis is lip-reading.
In this application, face animations synchronized with speech are generated and
presented to hearing-impaired people. If the face animations are lip-readable,
it will help the hearing-impaired people better understand the speech. We plan
to conduct human perception studies to identify hypotheses about visual factors
that are important to lip-reading. Then these hypotheses can be used to guide
the improvement of the face synthesis.
In our preliminary experiments, we first create animations for isolated digits.
Then these animations are presented to human subjects. The current subjects
include one PhD student and one faculty member who have lip-reading expe-
riences. In the first test, we test the lip-readability of face animation produced
using geometric motion model only. We find the following factors limit the lip-
readability: (1) the animation lacks wrinkles and shading changes in lip area so
that it is difficult for the perception of lip rounding and protrusion when their
durations are small; (2) the crafted tongue and teeth motions do not provide
enough visual cues to recognize interdental phoneme like / TH / in “three”.
Besides, certain un-natural synthesis results of mouth interior are distracting
for lip-reading. In the second test, we augment the animation by using appear-
ance model to synthesize texture variation. The results show that the perception
of subtle lip rounding, protrusion and stretching is considerably improved be-
cause of the added appearance variations. The appearance model also handles
the complex details inside the mouth. As a result, the recognition of interdental
phonemes such as / TH / is improved. However, subtle dynamic appearance