Skin Color in Face Analysis (Face Image Modeling and Representation) (Face Recognition) Part 2

Non-canonical Images and Colors

If images are not taken under the illumination used in camera calibration, the colors are distorted even more. The distortion will appear as a shift in colors, as can be seen in Fig. 9.7 which displays images taken under four different light sources while the camera is calibrated to one of them. In the upper image series, the camera was calibrated to the light source Horizon (first image on the left) and after light source was changed to incandescent A, TL84 and daylight, respectively. In the lower image series, the camera was calibrated to daylight (first image on the right) and then images were taken under TL84, A and Horizon.

The skin color tends to shift in the direction of illumination color change. More reddish prevailing illumination causes color shift towards red, while more bluish one adds blue components. Of course, a light source with strong spikes in spectra can cause additional distortions for certain colors. Since cameras have limited dynamic response ranges, the colors can be distorted also due to saturation or under-exposure.

Fig. 9.8 The skin NCC chromaticities were simulated using the data of the Sony camera (see Fig. 9.2) and the skin reflectances from Fig. 9.4. a shows the possible skin chromaticities when the camera was calibrated to a Planckian of 3000 K and b when the calibration illumination was a Planckian of 6000 K. The chromaticity range depends on the calibration illumination and the possible color temperature range of prevailing illuminations

Manual or automatic brightness control in the camera can alleviate this problem, but manual operation tends to be tedious and automatic control might cause problems by itself.

Figure 9.8 shows simulated skin chromaticities using only one calibration. The chromaticity range obtained depends on the calibration light and the color temperature range of the prevailing illumination. The possible range of skin colors (locus [44]) is affected by the amount of calibrations. Figure 9.8 shows that different white balancing illuminants have dissimilar ranges of possible skin chromaticities and produce separate skin locus. When the loci of all different calibrations are gathered together, a bigger locus is obtained, as shown in Fig. 9.9. Of course, the illumination range as well as different camera settings affect the locus size.

Separating Sources of Skin Data

Many materials, like inks and dyes, are used to imitate the appearance of skin. Some studies have been already done to examine how well the imitation works and how the real skin can be separated from imitation.

The skin data can come from different sources like real faces, photos or print [37]. The source cannot often be determined from normal RGB data, so spectral data is needed. An interesting spectral data region is near infrared. Figure 9.10 shows near infrared spectra for real faces, facial skin from photos and facial skin from a print of three different skin complexions. The spectra from photos and prints, which are flat, are clearly different from that of real faces. Thus simple ratio between two channels can be used to separate real skin from other sources. The level difference in real spectra between different complexions start to diminish as a degree of wavelength. Skin complexion groups are separable in print spectra, but not in photo spectra.

Fig. 9.9 The skin locus formed with all prevailing illumination/white balancing combinations

Fig. 9.10 Near infrared skin spectra from real faces (left), photos (middle) and paper (right)

The skin color appearance for mannequins is also sought after, but it clearly is different from real skin [1]. Kim et al. [24] have studied the differences between masked fake faces and real skin. They concluded that wavelengths of 685 nm and 850 nm can be used to discriminate them.

Modeling Skin Colors

Skin color model is a description of possible skin tones. To create such a model, one has to first select the color space in which the model is formed, then the mathematical model to describe the possible skin colors, and finally, the data upon which the model is defined. The performance of the model depends on all these factors and is a trade-off between generality of the model and accuracy for a certain image.

Skin detection methods have been compared in several studies using different data [22, 35, 46]. The studies disagree, which might be because the optimality of the model depends on its purpose, data, material and modeling parameters.

Behavior of Skin Complexions at Different Color Spaces Under Varying Illumination

Color space in which skin data is processed, has also an effect on detection. Not all color spaces are equal: they can map RGB values differently, which can be used to separate certain colors. Even a mixture of color spaces can be used like in [47], at least for canonical or nearly canonical images.

As mentioned earlier, a color space conversion does not remove chromaticity shifts due to illumination or effects caused by noise. In fact, noise can be detrimental for low RGB values or near thresholds. The brightness control or lack of it can have a strong effect on the possible skin chromaticities. If there is no automatic brightness or gain controller, it is possible for one channel to have low values or even underclipping. Therefore, the skin colors have been studied under varying illuminations [34].

RGB coordinates are device-oriented, but they can be converted into human vision oriented spaces like XYZ or CIE Lab. A correct conversion requires an illumination-dependent transform matrix, including also the effect of device characteristics. Of course, there exist general transforms matrices. None of the matrix transforms reduce the effect of changing light since it has already affected RGBs.

The more device oriented color spaces can be classified, based on the conversion method, into two groups: those using linear transforms from RGB and those obtained via non-linear transforms. For example, linear transform based color spaces are: I1I2I3, YES, YIQ, YUV, YCrCb (Rec. 601-5 and 709). Among the nonlinear transforms are: NCC rgb, modified rgb, natural logarithm ln-chromaticity, P1P2, l1l2l3, ratios between channels (G/R, B/R, and B/G), HSV, HSL, modified ab, TLS and Yuv.

Overlap between different skin complexions vary in color spaces. In [34], the overlaps between two complexions (pale and yellowish) were compared in different color spaces and across different cameras: the overlaps between them were reasonably high in all color spaces (ranging from 50-75 percent) when using different canonical images. When using both canonical and uncanonical images, the overlap still increased due to the fact that more colors fall into the region. However, when comparing skin data from different cameras, the overlaps between skin RGBs were smaller and dependent on the cameras used in comparison. Therefore, one can argue that color spaces and cameras used do have an effect on skin detection and thus for face recognition.

Color Spaces for Skin

Several color spaces have been suggested for general skin color modeling, but thus far, none of them has been shown to be superior to the other. The list of comparison studies for color spaces can be found, for example, in [33] or [22]. However, it seems that those spaces in which intensity is not separated so clearly from chro-maticity are similar to RGB. The separation can be evaluated using linear or linearized RGB data: RGB is transformed into color space using substitution in which c describes a uniform change in the intensity levels. If the factor c does not cancel out for chromaticity descriptors, the separation is incomplete.

Normalized color coordinates (NCC) are quite often used in modeling, and they separates the intensity and chromaticity. To avoid the intensity changes, only the chromaticity coordinates are used. In [46], different color spaces are compared in terms of efficiency and transferability of the model. The performance of NCC and CIE xy was superior to several other skin color models. It was also shown in [55] that NCC has a good discrimination power. More details of color spaces for skin detection can be found in [33] or [22].

A color can be uniquely defined in by its intensity and two chromaticity coordinates sinceThe chromaticity coordinates for NCC color space are defined as

The intensity is canceled from chromaticity coordinates since they are calculated by dividing the descriptor value of the channel by the sum of all descriptor values (intensity) at that pixel.

The modeling can be done using only the chromaticity coordinates to reduce the effect of illumination intensity changes, which are common in videos and images. Some models do include intensity (like in [14]), but more data is needed to construct the model and computational costs are increased due to a third component.

Skin Color Model and Illumination

Section 9.3 showed that illumination affects skin color both in canonical and un-canonical images. What is more, this dependency is camera-specific: the camera sensors and internal image preprocessing of the camera affect the color production and thus on the end results (see Fig. 9.11). Therefore, creating a universal model is difficult.

Many face detection algorithms assume that the images are taken under canonical or near canonical conditions. For many data sets, this is true. An example of this kind of image data set is a set of personal photos.

When the illumination varies, the previous approaches have a high risk of failure. Of course, the images can be subjected to color correction or color constancy algorithm, but sometimes this can lead even more serious color distortions [35].

Fig. 9.11 The camera and its properties determine the skin locus, as indicated by the loci of four cameras. However, some regions are common to all, most notable the region of skin tones

Color correction based approach has been suggested, for example, by Hsu et al. [15]: the colors in image are corrected so that the skin would appear in skin tones and after this segment the image using skin color model. The color correction is based on a pixel with a high brightness value which are assumed to belong to a white object. These pixels are used to calculate correction coefficient which are applied to the image. This approach can fail for many reasons like data loss due to saturation, or if a pixel with high brightness belongs to a nonwhite object. The latter case is demonstrated in Fig. 9.12.

For a more general skin model, one should use the knowledge of illumination changes, calibration and camera settings like in the skin locus-based approach [43]. The drawback of this model is that it is not so specific as canonical models—more color tones are included. Thus, more nonskin objects will be considered skin candidates. Since color itself is rarely enough to determine whether the target is skin or not, the face candidates are in case subjected for further processing.

Mathematical Models for Skin Color

The model for skin color can be either a mathematically defined area in color space or a statistical approach in which a probability to belong skin is attached to color tones. The model may be fixed or adaptive, and in the latter case, the update depends whether it is applied on single images or video frames. A more detailed review can be found, for example, in [33] or [22].

Fig. 9.12 The upper row displays the color segmentation results using Hsu et al. model [15] without the color correction part. The lower row shows the segmentation with their color correction method. The color correction fails because the yellow curtains have the highest brightness values and is assumed to be a white object

The area based approach uses a spatial constraint in the color space to define possible skin areas. The shape of the constraint can be simple thresholds like in [3] or a more complex shaped function like in [15]. Generally no thresholding is done, since the colors that fall inside the area are considered skin. These models often assume that skin has or can be corrected to have skin tone appearance. An exception is the skin locus in which the illumination changes are included in the model.

It is possible to adapt the model even for single images (e.g., [3,26,45]) although the successfulness depends on the validity of assumptions behind the adaptation criteria. The adaptation schema generally use a general skin model obtained from a representative image set and after that fine-tune into an image specific model. For example, in Cho et al. [3], the fine-tuning phase assumes that the skin color histogram is unimodal and skin color occurs mainly on real skin areas. This approach can fail if the image has dominant skin-colored, nonfacial object or the histogram is not unimodal.

The challenge of the probability-based approach is to be able to reliably find the probability distribution of skin colors. This requires collecting a representative data set of images for forming the model. An example of a statistical model is the one presented by Jones and Rehg [21]. They calculate the histogram and Gaussian models using over 1 billion labeled pixels. Many other statistical models like SOM or neural networks has been suggested and a review of them can be found, for example, in [33] or [22]. In addition to the statistical model, one has to determine the threshold limit for separating the skin from nonskin. It is difficult to automatically find the threshold value because the probability model found may not be valid for all images.

Fig. 9.13 Two consecutive frames are taken from a video sequence (the first and second image from the left). The facial skin areas of the frames are manually extracted and their skin RGB values are then converted to the NCC chromaticity space. The chromaticities from these two frames are marked with different colors in the right image. As can be observed from the rightmost image, the chromaticities overlap significantly

Video Sequences

The processing of video sequences is similar to that of single, independent images. Thus, the skin detection presented earlier can also be used for videos. The fixed skin color models are suitable for videos in which changes in illumination are minimal. Generally, this is not the case and the skin color models need to be updated. The model adaptation relies often on the dependencies between consecutive frames, which is true for many videos: The consecutive frames often exhibit sequential dependency. This can be observed in Fig. 9.13: the overlap between the chromaticities from two consecutive frames is significant.

If the illumination changes between images are slow (no abrupt, drastic object color changes) or the person moves in a nonuniform illumination field slowly enough, the skin color model can adapt to the color changes. This required some constraint for selecting the pixels used in the model update. Three different adaptive schemes have been suggested: two of them use spatial constraints [39, 57] (see Fig. 9.14) and one skin locus [35]. Thebasic idea is the same: to use some constraint to select the pixels for model updating. The spatial constraints use different ideas to select candidate pixels from a located face: the method of Raja et al. [39] updates the skin color model using pixels inside the localized face area. The pixels are selected from an area which is 1 /3 of the localization area and 1 /3 from the localization boundaries. Yoo and Oh [57] argued that the localization should resemble the shape of the object (face) and they used all pixels inside the elliptical face localization. The skin locus can be used in two ways: either the whole locus or partial locus is used to select skin colored pixels from the localized face and its near surroundings.

Fig. 9.14 Spatial constraints suggested for adaptive skin color modeling: the left image shows the method suggested by Raja et al. [39]. The outer box indicates the localized face while the pixels inside the inner box are used for model updating. The image on the right shows elliptical constraint by Yoo and Oh [57]

There are many possible methods for updating the skin color model, but perhaps a common method is the moving average, as presented in (9.5):

where M is a new, refreshed model, M is the model, t is the frame number and a is a weighting factor. Quite often, the weighting factor is set to 0.5 to get equal emphasis on the skin color model of current and previous frames. The moving average method provides a smooth transition between models from different frames. It also reduces the effect of noise, which can change pixel color without any variation in external factors and thus be detrimental to the models.

However, the spatial constraint models have been shown to be very sensitive to localization errors, therefore, they can easily adapt to nonskin objects [35]. The failure due to these constraints can happen even under a fairly moderate illumination change. In Fig. 9.15, Raja et al.’s method has failed while tracking a face on a video sequence and the skin color model is adapted to nonskin colored target, as shown in this image.

The constraint suggested by Raja et al. easily fails under a nonuniform illumination field change, as demonstrated in Fig. 9.16. The model is updated using the pixel inside the localization and therefore, it can adapt only to global illumination changes, but not to the nonuniform illumination field variation.

The correct localization of face is not so sensitive for a skin locus based approach since the nonskin colored pixels can be filtered out. Large skin colored objects connected to the face are problematic and cues other than color are needed to solve this.