Face Aging Modeling (Face Image Modeling and Representation) (Face Recognition) Part 1

Introduction

Face recognition accuracy is typically limited by the large intra-class variations caused by factors such as pose, lighting, expression, and age [16]. Therefore, most of the current work on face recognition is focused on compensating for the variations that degrade face recognition performance. However, facial aging has not received adequate attention compared to other sources of variations such as pose, lighting, and expression.

Facial aging is a complex process that affects both the shape and texture (e.g., skin tone or wrinkles) of a face. The aging process appears in different manifestations in different age groups, gender and ethnicity. While facial aging is mostly represented by the facial growth in younger age groups (e.g., below 18 years of age), it is mostly represented by relatively large texture changes and minor shape changes (e.g., due to the change of weight or stiffness of skin) in older age groups (e.g., over 18 years of age). Therefore, an age invariant face recognition scheme needs to be able to compensate for both types of aging process.

Some of the face recognition applications where age invariance or correction is required include (i) identifying missing children, (ii) screening for watch list, and (iii) multiple enrollment detection problems. These three scenarios have two common characteristics: (i) a significant age difference exists between probe and gallery images (images obtained at verification and enrollment stages, respectively) and (ii) an inability to obtain a user’s face image to update the template (gallery). Identifying missing children is one of the most apparent applications where age compensation is needed to improve the recognition performance. In screening applications, aging is a major source of difficulty in identifying suspects in a watch list. Repeat offenders commit crimes at different time periods in their lives, often starting as a juvenile and continuing throughout their lives. It is not unusual to encounter a time lapse of ten to twenty years between the first (enrollment) and subsequent (verification) arrests. Multiple enrollment detection for issuing government documents such as driver licenses and passports is a major problem that various government and law enforcement agencies face in the facial databases that they maintain. Face or some other types of biometric traits (e.g., fingerprint or iris) are the only ways to reliably detect multiple enrollments.

Ling et al. [10] studied how age differences affect the face recognition performance in a real passport photo verification task. Their results show that the aging process does increase the recognition difficulty, but it does not surpass the challenges posed due to change in illumination or expression. Studies on face verification across age progression [19] have shown that: (i) simulation of shape and texture variations caused by aging is a challenging task, as factors like life style and environment also contribute to facial changes in addition to biological factors, (ii) the aging effects can be best understood using 3D scans of human head, and (iii) the available databases to study facial aging are not only small but also contain uncontrolled external and internal variations (e.g., pose, illumination, expression, and occlusion). It is due to these reasons that the effect of aging in facial recognition has not been as extensively investigated as other factors that lead to large intra-class variations in facial appearance.

Some biological and cognitive studies on face aging process have also been conducted, see [18, 25]. These studies have shown that cardioidal strain is a major factor in the aging of facial outlines. Such results have also been used in psychological studies, for example, by introducing aging as caricatures generated by controlling 3D model parameters [12]. Patterson et al. [15] compared automatic aging simulation results with forensic sketches and showed that further studies in aging are needed to improve face recognition techniques. A few seminal studies [20, 24] have demonstrated the feasibility of improving face recognition accuracy by simulated aging. There has also been some work done in the related area of age estimation using statistical models, for example, [8, 9]. Geng et al. [7] learn a subspace of aging pattern based on the assumption that similar faces age in similar ways. Their face representation is composed of face texture and the 2D shape represented by the coordinates of the feature points as in the Active Appearance Models. Computer graphics community has also shown facial aging modeling methods in 3D domain [22], but the effectiveness of the aging model was not evaluated by conducting a face recognition test.

Table 10.1 gives a brief comparison of various methods for modeling aging proposed in the literature. The performance of these models is evaluated in terms of the improvement in the identification accuracy. When multiple accuracies were reported in any of the studies under the same experimental setup (e.g., due to different choice of probe and gallery), their average value is listed in Table 10.1; when multiple accuracies are reported under different approaches, the best performance is reported. The identification accuracies of various studies in Table 10.1 cannot be directly compared due to the differences in the database, the number of subjects and the underlying face recognition method used for evaluation. Usually, the larger the number of subjects and the larger the database variations in terms of age, pose, lighting and expression, the smaller the recognition performance improvement by an aging model. The identification accuracy for each approach in Table 10.1 before aging simulation indicates the difficulty of the experimental setup for the face recognition test as well as the limitations of the face matcher.

Table 10.1 A comparison of various face aging models [13]

	Approach	Face matcher	Database (#subjects, #images) in probe and gallery	Rank-1 identification accuracy (%)
				Original image	After aging model
				Original image	After aging model
Ramanathan et al. (2006) [20]	Shape growth modeling up to age 18	PCA	Private database (109, 109)	8.0	15.0
Lanitis et al. (2002) [8]	Build an aging function in terms of PCA coefficients of shape and texture	Mahalanobis distance, PCA	Private database (12, 85)	57.0	68.5
Geng et al. (2007) [7]	Learn aging pattern on concatenated PCA coefficients of shape and texture across a series of ages	Mahalanobis distance, PCA	FG-NET* (10,10)	14.4	38.1
Wang et al. (2006) [26]	Build an aging function in terms of PCA coefficients of shape and texture	PCA	Private database (NA, 2000)	52.0	63.0
Patterson et al. (2006) [14]	Build an aging function in terms of PCA coefficients of shape and texture	PCA	MORPH⁺ (9, 36)	11.0	33.0
Park et al. [13]	Learn aging pattern based on PCA coefficients in separated 3D shape and texture given 2D database	FaceVACS	FG-NET^** (82, 82) MORPH-Album1++ (612,612) BROWNS (4, 4)—probe (100, 100)—gallery	26.4 57.8 15.6	37.4 66.4 28.1

*Used only a very small subset of the FG-NET database that contains a total of 82 subjects +Used only a very small subset of the MORPH database that contains a total of 625 subjects **Used all the subjects in FG-NET

++Used all the subjects in MORPH-Album1 which have multiple images

Compared with other published approaches, the aging model proposed by Park et al. [13] has the following features.

• 3D aging modeling: Includes a pose correction stage and a more realistic model of the aging pattern in the 3D domain. Considering that the aging is a 3D process, 3D modeling is better suited to capture the aging patterns. Their method is the only viable alternative to building a 3D aging model directly, as no 3D aging database is currently available. Scanned 3D face data rather than reconstructed is used in [22], but they were not collected for aging modeling and hence, do not contain as much aging information as the 2D facial aging database.

• Separate modeling of shape and texture changes: Three different modeling methods, namely, shape modeling only, separate shape and texture modeling and combined shape and texture modeling (e.g., applying 2nd level PCA to remove the correlation between shape and texture after concatenating the two types of feature vectors) were compared. It has been shown that the separate modeling is better than combined modeling method, given the FG-NET database as the training data.

• Evaluation using a state-of-the-art commercial face matcher, FaceVACS: A state-of-the-art face matcher, FaceVACS from Cognitec [4] has been used to evaluate the aging model. Their method can thus be useful in practical applications requiring an age correction process. Even though their method has been evaluated only on one particular face matcher, it can be used directly in conjunction with any other 2D face matcher.

• Diverse Databases: FG-NET has been used for aging modeling and the aging model has been evaluated on three different databases: FG-NET (in a leave-one-person-out fashion), MORPH, and BROWNS. Substantial performance improvements have been observed on all three databases.

The rest of this topic is organized as follows: Sect. 10.2 introduces the preprocessing step of converting 2D images to 3D models, Sect. 10.3 describes the aging model, Sect. 10.4 presents the aging simulation methods using the aging model, and Sect. 10.5 provides experimental results and discussions. Section 10.6 summarizes the conclusions and lists some directions for future work.

Preprocessing

Park et el. propose to use a set of 3D face images to learn the model for recognition, because the true craniofacial aging model [18] can be appropriately formulated only in 3D. However, since only 2D aging databases are available, it is necessary to first convert these 2D face images into 3D. Major notations that are used in the following sections are defined first.

•a set of 3D face models used in constructing the reduced morphable model. nmm is the number of 3D face models.

• "reduced morphable model represented with the model parameter a.

•2D facial feature points for the i th subject at age j. n2d is the number of points in 2D.

•3D feature points for the i th subject at age is the number of points in 3D.

•facial texture for the ith subject at age j.

•reduced shape ofafter applying PCA on

•reduced texture ofafter applying PCA on

•top Ls principle components of

•top Lt principle components of

•synthesized 3D facial feature points at age j represented with weight ws.

•synthesized texture at age j represented with weight wt.

•

In the following subsections,is first transformed tousing the reduced morphable modelThen, 3D shape aging pattern spaceand texture aging pattern spaceare constructed using

2D Facial Feature Point Detection

Manually marked feature points are used in aging model construction. However, in the test stage the feature points need to be detected automatically. The feature points on 2D face images are detected using the conventional Active Appearance Model (AAM) [3, 23]. AAM models for the three databases are trained separately, the details of which are given below.

FG-NET

Face images in the FG-NET database have already been (manually) marked by the database provider with 68 feature points. These feature points are used to build the aging model. Feature points are also automatically detected and the face recognition performance based on manual and automatic feature point detection methods are compared. The training and feature point detection are conducted in a crossvalidation fashion.

MORPH

Unlike the FG-NET database, a majority of face images in the MORPH database belong to African-Americans. These images are not well represented by the AAM model trained on the FG-NET database due to the differences in the cranial structure between the Caucasian and African-American populations. Therefore, a subset of images (80) in the MORPH database are labeled as a training set for the automatic feature point detector in the MORPH database.

BROWNS

The entire FG-NET database is used to train the AAM model for detecting feature points on images in the BROWNS database.

3D Model Fitting

A simplified deformable model based on Blanz and Vetter’s model [2] is used as a generic 3D face model. For efficiency, the number of vertices in the 3D morphable model is drastically reduced to 81,68 of which correspond to salient features present in the FG-NET database, while the other 13 delineate the forehead region. Following [2], PCA was performed on the simplified shape sample set, {Smm}. The mean shape Smm, the eigenvalues λι’s, and unit eigenvectors Wl’s of the shape covariance matrix are obtained. Only the top L (= 30) eigenvectors are used, again for efficiency and stability of the subsequent fitting algorithm performed on the possibly very noisy dataset. A 3D face shape can then be represented using the eigenvectors as

where the parameter .controls the shape, and the covariance of a’s is the diagonal matrix with λi as the diagonal elements. A description is given below on how to transform the given 2D feature pointsto the corresponding 3D points using the reduced morphable model

Let E(·) be the overall error in fitting the 3D model of one face to its corresponding 2D feature points, where

Here, T(·) represents a transformation operator performing a sequence of operations, that is, rotation, translation, scaling, projection, and selecting n2d points out of n3d that have correspondences. To simplify the procedure, an orthogonal projection P is used.

In practice, the 2D feature points that are either manually labeled or automatically generated by AAM are noisy, which means overfitting these feature points may produce undesirable 3D shapes. This issue is addressed by introducing a Tikhonov regularization term to control the Mahalanobis distance of the shape from the mean shape. Let σ be the empirically estimated standard deviation of the energy E induced by the noise in the location of the 2D feature points. The regularized energy is defined as

Fig. 10.1 3D model fitting process using the reduced morphable model [13]

Fig. 10.2 Four example images with manually labeled 68 points (blue) and the automatically recovered 13 points (red) for the forehead region [13]

To minimize the energy term defined in (10.3), all the a\’s are initialized to 0, the rotation matrix R is set to the identity matrix and translation vector t is set to 0, and the scaling factor a is set to match the overall size of the 2D and 3D shapes. Then, R, T, and α are iteratively updated until convergence. There are multiple ways to find the optimal pose given the current α. In these tests, it was found that first estimating the best 2 x 3 affine transformation followed by a QR decomposition to get the rotation works better than running a quaternion based optimization using Rodriguez’s formula [17]. Note that tz is fixed to 0, as an orthogonal projection is used.

Figure 10.1 illustrates the 3D model fitting process to acquire the 3D shape. The associated texture is then retrieved by warping the 2D image. Figure 10.2 shows the manually labeled 68 points and automatically recovered 13 points that delineate the forehead region.

Aging Pattern Modeling

Following [7], the aging pattern is defined as an array of face models from a single subject indexed by the age. This model construction differs from [7] mainly in that the shape and texture are separately modeled at different ages using the shape (aging) pattern space and the texture (aging) pattern space, respectively, because the 3D shape and the texture images are less correlated than the 2D shape and texture that they used in [7]. The two pattern spaces as well as the adjustment of the 3D shape are described below.

Shape Aging Pattern

Shape pattern space captures the variations in the internal shape changes and the size of the face. The pose corrected 3D models obtained from the pre-processing phase are used for constructing the shape pattern space. Under age 19, the key effects of aging are driven by the increase of the cranial size, while at later ages the facial growth in height and width is very small [1]. To incorporate the growth pattern of the cranium for ages under 19, the overall size of 3D shape is rescaled according to the average anthropometric head width found in [5].

PCA is applied over all the 3D shapes,in the database irrespective of age j and subject i. All the mean subtractedare projected on to the subspace spanned by the columns of Vs to obtainas

which is an Ls x 1 vector.

Assuming that there are n subjects at m ages, the basis of the shape pattern space is then assembled as an m x n matrix with vector entries (or alternatively as an m x n x Ls tensor), where the jth row corresponds to age j and the ith column corresponds to subject i, and the entry at (j, i) isThe shape pattern basis is initialized with the projected shapes sij from the face database (as shown in the third column of Fig. 10.3). Then, missing values are filled using the available values along the same column (i.e., for the same subject). Three different methods are tested for the filling process: linear, Radial Basis Function (RBF), and a variant of RBF (u-RBF). Given available ages ai and the corresponding shape feature vectors Si, a missing feature value sx at age ax can be estimated by,in linear interpolation, where s1 and s2 are shape feature vectors corresponding to the ages a1 and a2 that are closest from ax, and li and I2 are weights inversely proportional to the distance from ax to a1 and a2. In the u-RBF process, each feature is replaced by a weighted sum of all available features as

where φ(.) is a RBF function defined by a Gaussian function. In the RBF method, the mapping function from age to the shape feature vector is calculated by sx = for each available age and feature vector ai and Si, where Ti’s are estimated based on the known scattered data. Any missing feature vector sx at age x can thus be obtained.

Fig. 10.3 3D aging model construction [13]

The shape aging pattern space is defined as the space containing all the linear combinations of the patterns of the following type (expressed in PCA basis):

Note that the weight ws in the linear combination above is not unique for the same aging pattern. The regularization term can be used in the aging simulation described below to resolve this issue. Given a complete shape pattern space, mean shape S and the transformation matrix Vs, the shape aging model with weight ws is defined as

Texture Aging Pattern

The texture patternfor subject i at age j is obtained by mapping the original face image to frontal projection of the mean shape S followed by a column-wise concatenation of the image pixels. After applying PCA onthe transformation matrix Vt and the projected textureare calculated. The same filling procedure is used as in the shape pattern space to construct the complete basis for the texture pattern space usingA new texturecan be similarly obtained, given an age j and a set of weights wt as

Figure 10.3 illustrates the aging model construction process for both shape and texture pattern spaces.

Separate and Combined Shape & Texture Modeling

Giventhey can be used directly for the aging modeling or another step of PCA on the new concatenated feature vectorcan be applied.

Applying PCA onwill generate a set of new Eigen vectors,[3]. The modeling usingis called as “separate shape and texture modeling” andas a “combined shape and texture modeling.”