Dynamic Facial Expression Recognition Using Boosted Component-Based Spatiotemporal Features and Multi-classifier Fusion - Advanced Concepts for Intelligent Vision Systems

Information Technology Reference

In-Depth Information

region of the input space and multiple classifiers can supplement each other. According

to our best knowledge, only few studies in facial expression recognition paid attention

to multi-classifier fusion. To utilize the advantage of the multi-classifier fusion, in this

paper, we also extend a framework of multi-classifier fusion based on decision rules to

facial expression recognition.

In this paper, we propose a novel component-based approach for facial expression

recognition from video sequences. Inspired by the methods presented in [15,21], 38 im-

portant facial interest regions based on prior information are first determined, and then

spatiotemporal feature descriptors are used to describe facial expressions from these ar-

eas. Furthermore, we use AdaBoost to select the most important discriminative features

for all components. In the classification step, we present a framework for fusing recog-

nition results from several classifiers, such as support vector machines, boosting, Fisher

discriminant classifier for exploiting the complementary information among different

classifiers. Extensive experiments on the Cohn-Kanade facial expression database [22]

are carried out to evaluate the performance of the proposed approach.

2

Boosted Component-Based Spatiotemporal Feature Descriptor

2.1

Facial Interest Points

In many earlier methods [1,2,23], fusion of geometric features and appearance features

can improve the performance of expression recognizers. Geometric features are usually

formed by parameters obtained by tracking facial action units or facial points' variation.

It is well known that not all features from the whole face are critical to expression

recognizers. Yesin et al. [14] proposed to apply optical flow to regions based on posi-

tions of the eyes, eyebrows and the mouth. Zhang et al. [24] developed a framework

in which Gabor wavelet coefficients were extracted from 34 fiducial points in the face

image. In methods on scale-invariant feature transform (SIFT) [21], SIFT keypoints of

objects are first extracted from a set of reference images in order to avoid from comput-

ing all points in an image. It is thus found that the search of interest points or regions in

facial images is more important to component-based approach.

However, faces are different from other objects, in other words, important features

for facial expression are always expressed in some special regions, such as mouth, cheek

etc. Thus different from SIFT, our interest points detection is based on prior-experience.

In our paper, 38 facial points are considered, shown in Fig. 1(a).

The approach for detecting those interest points is critical to our approach. If con-

sidering accuracy, manual labeling facial points for face image is good for expression

recognizers. Unfortunately, this method costs much time and is not practical. It is well

known that some methods are proposed for detecting or tracking facial points, such as

AAM, ASM, and Elastic Bunch Graph Matching etc. After comparison, ASM [25] is

applied to detect the facial points.

Geometric information from the first frame is obtained by applying ASM as shown

in Fig. 1(a). Here, geometric models are trained from FRAV2D database [26] and MMI

database [27].

Advanced Concepts for Intelligent Vision Systems

Search WWH ::

Custom Search

Home