Information Technology Reference
In-Depth Information
Fig. 1. (a) Results of facial points detection (b) Components for calculating spatiotemporal
features
2.2
Component-Based Spatiotemporal Feature Descriptor
It is well known that feature extraction is critical to any facial expression recognition
system. After detecting interest points, the appearance feature is considered next in
our approach. Based on those facial interest points, the areas centered at these points
have more discriminative information as shown in Fig. 1(b). The size of each area is
32
32, it is observed that majority of features are focused on eyes and mouth. And
the regions near cheeks and forehead are also considered in our approach. If the size
of each area is too small, the features extracted from forehead, cheek, eyebrows have
too little discriminative information. In contrast, if too large, most areas near mouth and
eyes are overlapping too much, which would cause too much redundant information. In
our experiments (Sec. 4), we will show the influence of region sizes.
LBP-TOP (local binary pattern from three orthogonal planes) has been proposed for
motion analysis and shown excellent performance in the classification of expression
recognition [28]. Features extracted by this method describe effectively appearance,
horizontal motion and vertical motion from the image sequence.
We extend to use LBP-TOP to describe the spatiotemporal features of 38 compo-
nents, shown in Fig. 2. In Fig. 2, XY plane shows the appearance of each component,
XT plane shows the horizontal motion, which gives the idea of how one row changes
in the temporal domain, YT as well shows the vertical motion, which gives the idea
of how one column changes in the temporal domain. For LBP-TOP, it is possible to
change the radii in axes X, Y and T, which are marked as R X , R Y and R T . Also dif-
ferent numbers of neighboring points are used in the XY, XT and YT planes, which are
marked as P XY ,P XT and P YT . Using these notions, LBP-TOP features are denoted
as LBP-TOP P XY ,P XT ,P YT ,R X ,R Y ,R T . After detecting each component, the LBP-TOP
histograms for each component are computed and concatenated into a single histogram
to represent the appearance and motion of the facial expression sequence. In our further
experiments, the radii in axes X, Y and T are set as 3; the numbers of local neighboring
points around the central pixel for all three planes are set as 8. In our case, we use CSF
( C omponent-based S patialtemporal F eatures) for abbreviation.
The component detection of images with pose variation in a near-frontal view face
is a challenge to our present implementation, since the component extraction is based
on the first frame. For solving this problem, we use a simple solution to align face
×
Search WWH ::




Custom Search