Graphics Reference
In-Depth Information
of a conformal mapping approach. The vertex correspondence across 3D model sequences
provides a set of motion trajectories (vertex flow) of 3D face scans. Consequently, the vertex
flow can be depicted on the adapted generic model (tracking model) through estimation of the
displacement vector from the tracked points of the current frame to the corresponding points of
the first frame (assumed to have a neutral expression). A facial motion vector is then obtained to
describe the dynamics of facial expression across a 3D frame sequence. In the spatial analysis,
an automatic surface labelling approach is applied on the tracked locations of the range models
in order to classify the 3D primitive surface features into eight basic categories. As a result, each
depth scan in the sequence can be represented by a spatiotemporal feature vector that describes
both shape and motion information and provides a robust facial surface representation. Once
spatiotemporal features are extracted, a two-dimensional Hidden Markov Model (HMM) is
used for classification. In particular, a spatial HMM and a temporal HMM were used to model
the spatial and temporal relationships between the extracted features. Exhaustive analysis was
performed on the BU-4DFE database, with a reported average recognition rate equal to 83.7%
for identity-independent facial expression recognition. The main limit of this solution resides
in the use of the 83 manually annotated landmarks of the BU-4DFE that are not released for
public use.
The approach proposed by Sandbach et al. (2011) exploits the dynamics of 3D facial
movements to analyse expressions. This is obtained by first capturing motion between frames
using free-form deformations and extracting motion features using a quad-tree decomposition
of several motion fields. GentleBoost classifiers are used to simultaneously select the best
features to use and perform the training using two classifiers for each expression: one for
the onset temporal segment, and the other for the offset segment. Then, HMMs are used for
temporal modeling of the full expression sequence, which is represented as the composition
of four temporal segments, namely, neutral-onset-apex-offset, which model a sequence with
an initial neutral segment followed by the activation of the expression, maximum intensity
of the expression, deactivation of the expression and closing of the sequence again with a
neutral expression. The average correct classification results for three prototypic expressions
(i.e., happiness, anger, surprise) of the BU-4DFE database is equal to 81.93%.
In Le et al. (2011) a level curve-based approach is proposed to capture the shape of 3D
facial models. The level curves are parametrization using the arclength function. The Chamfer
distance is applied to measure the distances between the corresponding normalized segments,
partitioned from these level curves of two 3D facial shapes. These measures are then used
as spatiotemporal features to train HMM, and since the training data were not sufficient for
learning HMM, the authors proposed to apply the universal background modeling to overcome
the overfitting problem. Using the BU-4DFE database to evaluate their approach, they reached
an overall recognition accuracy of 92.22% for three prototypic expressions (i.e., happiness,
sadness, surprise).
The work of Fang et al. (2011) proposes a fully automatic 4D facial expression recognition
approach with a particular emphasis on 4D data registration and dense correspondence between
3D meshes along the temporal line. The variant of the local binary patterns (LBP) descriptor
proposed in Zhao and Pietikainen (2007), which computes LBP on three orthogonal planes is
used as face descriptor along the sequence. Results are provided on the BU-4DFE database
for all expressions and for the subsets of expressions used in Sandbach et al. (2011) and Le
et al. (2011), showing improved results with respect to competitor solutions.
Search WWH ::




Custom Search