CONCLUSION AND FUTURE WORK - 3D Face Processing: Modeling, Analysis and Synthesis

Graphics Reference

In-Depth Information

systematic approach could be to customize the face shapes from real data

using machine learning techniques.

The development of “Baldi” is not based on an open framework. That is,

the components, such as spatial deformation model and temporal deforma-

tion model, are created and tuned exclusively for “Baldi”. Therefore, it

is difficult for other researchers to incorporate components of “Baldi” into

other animation systems. It would be highly desirable that the develop-

ment of “Baldi” is formulated as more general methodology such that other

researchers could re-use and refine its components in the future.

2.3.2 Our ongoing and future work

Compared to “Baldi”, one of the goals of our research is to provide a general,

unified framework to guide the development of face motion modeling, analysis

and synthesis. It could result in compact and efficient animation tools, which

can be used by users with various backgrounds (e.g. psychologists) to create

animation suitable for their applications. On the other hand, we make use

of feedback from those applications to devise general principles to guide the

refinement of the synthesis.

The current target application for evaluating our face synthesis is lip-reading.

In this application, face animations synchronized with speech are generated and

presented to hearing-impaired people. If the face animations are lip-readable,

it will help the hearing-impaired people better understand the speech. We plan

to conduct human perception studies to identify hypotheses about visual factors

that are important to lip-reading. Then these hypotheses can be used to guide

the improvement of the face synthesis.

In our preliminary experiments, we first create animations for isolated digits.

Then these animations are presented to human subjects. The current subjects

include one PhD student and one faculty member who have lip-reading expe-

riences. In the first test, we test the lip-readability of face animation produced

using geometric motion model only. We find the following factors limit the lip-

readability: (1) the animation lacks wrinkles and shading changes in lip area so

that it is difficult for the perception of lip rounding and protrusion when their

durations are small; (2) the crafted tongue and teeth motions do not provide

enough visual cues to recognize interdental phoneme like / TH / in “three”.

Besides, certain un-natural synthesis results of mouth interior are distracting

for lip-reading. In the second test, we augment the animation by using appear-

ance model to synthesize texture variation. The results show that the perception

of subtle lip rounding, protrusion and stretching is considerably improved be-

cause of the added appearance variations. The appearance model also handles

the complex details inside the mouth. As a result, the recognition of interdental

phonemes such as / TH / is improved. However, subtle dynamic appearance

Search WWH ::

Custom Search

Home