Dynamic Facial Expression Recognition Using Boosted Component-Based Spatiotemporal Features and Multi-classifier Fusion - Advanced Concepts for Intelligent Vision Systems

Information Technology Reference

In-Depth Information

Fig. 1. (a) Results of facial points detection (b) Components for calculating spatiotemporal

features

2.2

Component-Based Spatiotemporal Feature Descriptor

It is well known that feature extraction is critical to any facial expression recognition

system. After detecting interest points, the appearance feature is considered next in

our approach. Based on those facial interest points, the areas centered at these points

have more discriminative information as shown in Fig. 1(b). The size of each area is

32

32, it is observed that majority of features are focused on eyes and mouth. And

the regions near cheeks and forehead are also considered in our approach. If the size

of each area is too small, the features extracted from forehead, cheek, eyebrows have

too little discriminative information. In contrast, if too large, most areas near mouth and

eyes are overlapping too much, which would cause too much redundant information. In

our experiments (Sec. 4), we will show the influence of region sizes.

LBP-TOP (local binary pattern from three orthogonal planes) has been proposed for

motion analysis and shown excellent performance in the classification of expression

recognition [28]. Features extracted by this method describe effectively appearance,

horizontal motion and vertical motion from the image sequence.

We extend to use LBP-TOP to describe the spatiotemporal features of 38 compo-

nents, shown in Fig. 2. In Fig. 2, XY plane shows the appearance of each component,

XT plane shows the horizontal motion, which gives the idea of how one row changes

in the temporal domain, YT as well shows the vertical motion, which gives the idea

of how one column changes in the temporal domain. For LBP-TOP, it is possible to

change the radii in axes X, Y and T, which are marked as R X , R Y and R T . Also dif-

ferent numbers of neighboring points are used in the XY, XT and YT planes, which are

marked as P XY ,P XT and P YT . Using these notions, LBP-TOP features are denoted

as LBP-TOP P XY ,P XT ,P YT ,R X ,R Y ,R T . After detecting each component, the LBP-TOP

histograms for each component are computed and concatenated into a single histogram

to represent the appearance and motion of the facial expression sequence. In our further

experiments, the radii in axes X, Y and T are set as 3; the numbers of local neighboring

points around the central pixel for all three planes are set as 8. In our case, we use CSF

( C omponent-based S patialtemporal F eatures) for abbreviation.

The component detection of images with pose variation in a near-frontal view face

is a challenge to our present implementation, since the component extraction is based

on the first frame. For solving this problem, we use a simple solution to align face

×

Search WWH ::

Custom Search

Home