Gabor-Like Image Filtering for Transient Feature Detection and Global Energy Estimation Applied to Multi-expression Classification (Computer Vision,Imaging and Computer Graphics) Part 2

Feature-Based Processing of Emotional Segments

Feature-based processing consists in the combination of the information resulting from the permanent and the transient facial feature deformations for the recognition of facial expressions during each emotional segment.

Permanent Facial Feature Information. The permanent facial feature behavior is measured based on the work of Hammal et al., [1]. First, face and permanent facial features (eyes, eyebrows and mouth) are automatically segmented (see [11] and Fig. 5.a). Secondly, five characteristic distances D1 1 < i < 5 coding the displacement of a set of selected facial feature points according to the neutral state are measured (Fig. 5.b, see [1] for detailed explanation of this choice). Facial expressions are then characterized by the behavior of the measured characteristic distances.

 (a) Example of facial features segmentation [11]; (b) Associated characteristic distances [1]

Fig. 5. (a) Example of facial features segmentation [11]; (b) Associated characteristic distances [1]


Transient Facial Feature Information. In addition to the permanent facial features [1] and in order to provide additional information to support the recognition of facial expressions, a new method is introduced for the automatic detection and behavior analysis of the transient facial features such as the nasal root wrinkles (Fig. 6 Areas 1) and the nasolabial furrows (Fig. 6 Areas 2,3) (being part of the most important visual cues used by human observer for facial expression recognition [20]). Transient facial feature areas are first located based on the segmentation of the permanent facial features (Fig. 6). The transient features correspond to the appearance of oriented segment with different 3D shape and thickness (i.e. different spatial frequencies). Then they can be measured by a set of filter at different spatial frequencies and orientations. The filtering based-method proposed in section 2.1 is then applied inside each selected area for the estimation of their appearance (i.e. characterized by an increase of the local energy) and the corresponding orientation when necessary. Fig. 7 shows the different processing steps.

(a) Detected wrinkles regions, (b) Transient feature areas, R: eyes radius, W the distance between eyes corners

Fig. 6. (a) Detected wrinkles regions, (b) Transient feature areas, R: eyes radius, W the distance between eyes corners

After the selection of each area of interest (Fig. 7.a), a Hamming window is applied to each area (Fig. 7.b). The response of orientation bands Bjt that corresponds to the sum of the responses of all filters sharing the same central orientation at different spatial frequencies (Fig. 7.c grey) is measured as:

tmp5839356_thumb_thumb

Where St(f,θ) is the Fourier power spectrum of the current frame t (expressed in polar coordinates) and Gi j is the transfer function of the filter (see equation 1).

The use of orientation bands allows analyzing transient features independently of their spatial frequency making the detection more robust to individual morphological differences.

Wrinkles Detection. For each frame t, nasal root wrinkles and nasolabial furrows are detected based on the sum of total energy Et over all the orientation bands j inside each selected area as:

tmp5839357_thumb_thumb

Wrinkles are “present” if Et is higher than a predefined threshold and “absent” otherwise. Threshold values on the energy measure are obtained after learning process over three benchmark databases (Cohn-Kanade, Dailey-Cottrel and STOIC databases) and generalized on the Hammal-Caplier database and the MMI database. Table 2 in the results section shows the detection performances. The obtained results reach a sufficient precision to reinforce the information already provided by the permanent facial features.

Transient features detection and orientation estimation

Fig. 7. Transient features detection and orientation estimation

Nasolabial Furrows Orientation. Once the nasolabial furrows detected, their orientation (the angle between their edge line and the horizontal plane defined by the line joining the irises’ centers) is measured by linear combination of the orientation bands’ responses as:

tmp5839359_thumb_thumb

Fig. 8 shows examples of dynamic detection of nasolabial furrows and nasal roots wrinkles during sequences of happiness and disgust expressions. One can see that nasal roots appear for disgust but not for happiness. Nasolabial furrows appear however for both expressions but their orientations are different according to the expression. These examples show the usefulness of these wrinkles to characterize the corresponding facial expressions. Moreover, the precision of the orientation estimation (see Table 4) is high enough to discriminate between different expressions where these transient features appear.

Example of nasolabial furrows and nasal roots detection during happiness (a) and disgust (b) sequences; gray temporal windows (second rows) indicate the temporal presence of the transient features based on the energy threshold; third rows display the measured angles of the nasolabial furrows (around 60° for happiness and 45° for disgust)

Fig. 8. Example of nasolabial furrows and nasal roots detection during happiness (a) and disgust (b) sequences; gray temporal windows (second rows) indicate the temporal presence of the transient features based on the energy threshold; third rows display the measured angles of the nasolabial furrows (around 60° for happiness and 45° for disgust)

Numerical to Symbolic Conversion of Facial Feature Behavior

A numerical to symbolic conversion translates the measured distances, transient features and the corresponding angles into symbolic states reflecting their behavior compared to the neutral state. First, the value of each characteristic distance Di is coded with five symbolic states (based on the work of [1]) reflecting the magnitude of the corresponding deformations: Si if Di is roughly equal to its value in the Neutral expression, C+ (vs. Ci ) if D1 is significantly higher (vs. lower) than its value in the neutral expression, and Si u C+ (vs. Si u Ci ) if the D1 is neither sufficiently higher (vs. lower) to be in C+ (vs. C-), nor sufficiently stable to be in S1. Following the symbolic association on the permanent features [1], a set of symbolic states is introduced to encode nasal root and nasolabial furrows behaviors: Pj for “present” or Aj for “absent” 1 < j < 2 according to the corresponding energy measure as described in section 2.2. The explicit doubt of their state Pj U Aj (Pj or Aj ) is introduced and allows modeling the uncertainty of their appearance (see section 3.1). Finally, two symbolic states are introduced for nasolabial furrows’ angles: Op for “opened” and Cl for “closed”. If the angle is higher (resp. lower) than a predefined value the state Op (resp. Cl) is chosen. As for the wrinkles detection a doubt state Op U Cl is also introduced to model the uncertainty of the measured angles (see section 3.1). The numerical to symbolic conversion is carried out using the functions depicted in Fig. 9 for each sensor. Each facial expression is thus defined by a specific combination of symbolic states. Table 1 summarizes the characteristic distances, the transient feature and the nasolabial furrow angle states for each facial expression.

However, a logic-like system is not sufficient to model the facial expressions. Indeed, an automatic facial expression system should explicitly model the doubt and uncertainty of the sensors (such as Pj U Aj states) generating its conclusion with confidence that reflects uncertainty of the sensors detection and tracking. For this reason, the Transferable Belief Model is used.

Table 1. Rules table defining the visual cue states corresponding to each facial expression

Rules table defining the visual cue states corresponding to each facial expression

Next post:

Previous post: