Graphics Reference
In-Depth Information
face recognition. In (Mayo and Zhang, 2009), SIFT keypoints detected on multiple 2D depth
images have been used to perform 3D face recognition. SIFT descriptors computed on a
sampling grid of points in 2D depth images have been used in (Ohbuchi and Furuya, 2004)
for 3D object retrieval by visual similarity. Finally, SIFT descriptors have been also used in
(Zheng et al., 2009) to perform 2D expression recognition from non-frontal face images.
Grounding on these studies, in the following we discuss an approach that uses local descrip-
tors of the face to perform person independent 3D facial expression recognition. This approach
has been originally proposed in (Berretti et al., 2010e), (Berretti et al., 2010a), and subsequently
developed to a completely automatic solution that exploits the local characteristics of the face
around a set of facial keypoints automatically detected (Berretti et al., 2011a). In this solu-
tion, some facial landmarks are first identified, and then SIFT descriptors computed at these
landmarks are combined together as feature vector representing the face. A feature selection
approach is then applied to these vectors inorder to extract the subset of most relevant fea-
tures, and the selected features are finally classified using SVMs. As it emerges from the
experimental evaluation, this approach is capable to achieve state of the art results on the
BU-3DFE database just relying on few keypoints that are automatically detected and without
using neutral scans as reference.
In the rest of this section, we will first briefly provide a solution for the automatic identi-
fication of facial keypoints. Then we will address the adaptation of SIFT descriptors to the
proposed case and the feature selection approach used to reduce the set of SIFT features and
the SVM based classification of the selected features. We will also provide a summary of the
results obtained with this approach.
Automatic Identification of Facial Keypoints
The BU-3DFE database is the standard benchmark to compare 3D facial expression recognition
algorithms (see Section 5.2). However, the fact that this database provides a set of manually
identified landmarks, and the inherent difficulty in automatically detecting the majority of
these landmarks has oriented the research towards semi-automatic solutions for 3D facial
expression recognition as illustrated in Section 5.4.2. In semi-automatic solutions, the position
of facial landmarks is assumed to be known to achieve high facial expression recognition
rates (see Section 5.4.1), but this hinders the applicability of these solutions to the general
case in which manual annotation of the landmarks in 3D is not available or even possible. To
overcome this limitation, in Berretti et al. (2011a) a completely automatic solution to identify
fiducial points of the face is proposed, which is shortly reviewed in the following paragraphs.
As first pre-processing step, the 3D face scans were transformed to depth images where
the gray-value of each image pixel represents the depth of the corresponding point on the 3D
surface. As an example, Figure 5.15 shows the depth images derived from the 3D face scans
of a same subject under three different facial expressions.
On the depth images, the point with maximum gray value has been used as initial estimate
of the tip of the nose. This point was used to crop a rectangular region of the face (following
anthropometric statistical measures (Farkas, 1994), the cropped region extends 50 mm on the
left and 50 mm on the right of the nose tip, and 70 mm above and 50 mm below the nose tip).
The cropped region of the face is used for all the subsequent steps of the processing.
Search WWH ::




Custom Search