Techniques for Face Motion & Expression Analysis on Monocular Images - 3D Modeling and Animation: Synthesis and Analysis Techniques for the Human Body

Game Development Reference

In-Depth Information

performed on frontal view faces. Features are studied using Gabor filters and

afterwards classified using a previously trained HMM. The HMM is applied in

two ways:

•

taking Gabor representations as inputs, and

•

taking support vector machine (SVM) outputs as inputs.

SVMs are used as classifiers. They are a way to achieve good generalization

rates when compared to other classifiers because they focus on maximally

informative exemplars, the support vectors. To match face features, they first

convolve them with a set of kernels (out of the Gabor analysis) to make a jet.

Then, that jet is compared with a collection of jets taken from training images,

and the similarity value for the closest one is taken. In their study, Bartlett et al.

claim an AU detection accuracy from 80% for eyebrow motion to around 98%

for eye blinks.

CMU has opted for another approach, where face features are modeled in multi-

state facial components of analysis. They use neural networks to derive the AUs

associated with the motion observed. They have developed the facial models for

lips, eyes, brows, cheeks and furrows. In their article, Tian et al. (2001) describe

this technique, giving details about the models and the double use of NN, one for

the upper part of the face and a different one for the lower part. (See Figure 6.)

They do not discuss the image processing involved in the derivation of the feature

model from the images. Tests are performed over a database of faces recorded

under controlled light conditions. Their system allows the analysis of faces that

are not completely in a frontal position, although most tests were performed only

on frontal view faces. The average recognition rates achieved are around 95.4%

for upper face AUs and 95.6% for lower face AUs.

Piat and Tsapatsoulis (2000) take the challenge of deducing face expression out

of images from another perspective, no longer based on FACS. Their technique

finds first the action parameters (MPEG-4 FAPs) related to the expression being

analyzed and then they formulate this expression with high-level semantics. To

do so, they have related the intensity of the most used expressions to their

associated FAPs. Other approaches (Chen & Huang, 2000) complement the

image analysis with the study of the human voice to extract more emotional

information. These studies are oriented to develop the means to create a Human-

Computer Interface (HCI) in a completely bimodal way.

The reader can find in Pantic and Rothkrantz (2000) overviews and comparative

studies of many techniques, including some those just discussed, analyzed from

the HCI perspective.

Search WWH ::

Custom Search

Home