Graphics Reference
In-Depth Information
Imaging (Di3D, 2006) was used to acquire 3D facial sequences. The system consists of two
pods with three cameras in each pod, and two floodlights. All six cameras have a high
resolution four megapixel sensor. In each pod, two cameras are used for depth recovery,
whereas the third one captures texture. The maximum recording speed is 60 frames per second.
Each pod generates a range map from its own pair of corresponding stereo images using a
passive stereo-photogrammetry approach and produces a related 3D triangular mesh that
covers approximately half of the facial area. Two 3D meshes from both pods are subsequently
stitched together to form a complete face representation with an RMS (root-mean-square) target
accuracy of approximately 0.5 mm. The system is able to construct a face model covering
nearly 180 field. The participants were asked to perform a number of facial expressions
one after another. Each recorded sequence starts and ends with a neutral facial appearance
and is performed at a specified expression intensity. To reduce the required database storage
space and to simplify time warping between expressions for construction of the common time
reference frame, the participant was asked to complete each expression in less than 10s. For
each sequence, six videos were recorded simultaneously with an image resolution of 2352
×
1728 pixels per frame. The recorded video sequences include four grayscale sequences for
geometry reconstruction and two color sequences for texture mapping.
Currently there are 80 subjects included in the database. The majority of them, 65, are
undergraduate students from the Performing Arts Department at the University of Central
Lancashire. The rest are undergraduate students, postgraduate students, and members of staff
from other departments of the same university without any specific training in acting. Their
age range is between 18 and 60. The database consists of 48 female and 32 male subjects
from a variety of ethnic origins. Each 3D model in a sequence contains approximately 20,000
vertices. For each recorded facial articulation, the following files are included in the database:
(1) a sequence of OBJ files with associated MTL and texture JPG files for each individual
frame; (2) a video clip; (3) animated GIF files, and (4) text file with 3D position of 84
tracked landmarks. On average, the duration of the sequence showing one of the seven basic
expressions is about 3s long, whereas the durations of the mouth/eyebrow articulations and
phrases reading sequences are 6s and 10s, respectively.
5.3 3D Face Recognition
Because facial biometrics is natural, contactless, and non-intrusive, it emerges as the most
attractive way for identity recognition. Unfortunately, 2D-based face recognition technologies
still face difficult challenges, such as pose variations, changes in lighting conditions, occlu-
sions, and facial expressions. Instead, 3D output from laser scanners is minimally dependent
on the external environmental factors and provides faithful measurements of shapes facial
surfaces. It is the case the only remaining variability that is manifested within the same class
(i.e., within the measurements of the same person) is the one introduced by changes in facial
expressions. In fact, changes induced by facial expressions modify the shapes of facial surfaces
to some extent and introduce a nuisance variability that has to be accounted for in shape-based
3D face recognition. We argue that the variability introduced by facial expressions has become
one of the most important issues in 3D face recognition. The other important issues relate to
data collection and imperfections introduced in that process. It is difficult to obtain a pris-
tine, continuous facial surface, or a mesh representing such a surface, with the current laser
Search WWH ::




Custom Search