Game Development Reference
In-Depth Information
performed on frontal view faces. Features are studied using Gabor filters and
afterwards classified using a previously trained HMM. The HMM is applied in
two ways:
taking Gabor representations as inputs, and
taking support vector machine (SVM) outputs as inputs.
SVMs are used as classifiers. They are a way to achieve good generalization
rates when compared to other classifiers because they focus on maximally
informative exemplars, the support vectors. To match face features, they first
convolve them with a set of kernels (out of the Gabor analysis) to make a jet.
Then, that jet is compared with a collection of jets taken from training images,
and the similarity value for the closest one is taken. In their study, Bartlett et al.
claim an AU detection accuracy from 80% for eyebrow motion to around 98%
for eye blinks.
CMU has opted for another approach, where face features are modeled in multi-
state facial components of analysis. They use neural networks to derive the AUs
associated with the motion observed. They have developed the facial models for
lips, eyes, brows, cheeks and furrows. In their article, Tian et al. (2001) describe
this technique, giving details about the models and the double use of NN, one for
the upper part of the face and a different one for the lower part. (See Figure 6.)
They do not discuss the image processing involved in the derivation of the feature
model from the images. Tests are performed over a database of faces recorded
under controlled light conditions. Their system allows the analysis of faces that
are not completely in a frontal position, although most tests were performed only
on frontal view faces. The average recognition rates achieved are around 95.4%
for upper face AUs and 95.6% for lower face AUs.
Piat and Tsapatsoulis (2000) take the challenge of deducing face expression out
of images from another perspective, no longer based on FACS. Their technique
finds first the action parameters (MPEG-4 FAPs) related to the expression being
analyzed and then they formulate this expression with high-level semantics. To
do so, they have related the intensity of the most used expressions to their
associated FAPs. Other approaches (Chen & Huang, 2000) complement the
image analysis with the study of the human voice to extract more emotional
information. These studies are oriented to develop the means to create a Human-
Computer Interface (HCI) in a completely bimodal way.
The reader can find in Pantic and Rothkrantz (2000) overviews and comparative
studies of many techniques, including some those just discussed, analyzed from
the HCI perspective.
Search WWH ::




Custom Search