An Unsupervised Ensemble Approach for Emotional Scene Detection from Lifelog Videos - Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing

Information Technology Reference

In-Depth Information

The appearance features are based on the skin texture of a face and can describe

the appearance changes of a face such as wrinkles and furrows. The appearance

features can be obtained from the intensity distributions of the pixels in a facial

image. For instance, Gabor wavelet [10] and the local binary patterns [11] are widely

used as this type of features.

The geometric features describe the shape and locations of several facial compo-

nents such as eyebrows, eyes, and a mouth. For example, 3D models of the faces are

used to accurately describe and recognize facial expressions [12][13]. These models

can properly describe the facial structures and will be effective for accurate facial

expression recognition. In the lifelog video retrieval, however, it will be difficult

to prepare 3D facial features within reasonable cost. By using several salient facial

feature points (e.g., the end points of the mouth and a center point of the eyes),

the facial features can be more concise. The facial features are defined as the po-

sitional relationship of the facial feature points such as the distance between two

points and the angle between two line segments formed by connecting three points

[14][15][16]. In this study, we adopt the geometric features represented by the posi-

tional relationships of a few facial feature points because of the conciseness and the

better understandability of the facial features.

Most of the facial expression recognition methods are supervised while the su-

pervised learning needs sufficient training data. Because preparing the training data

requires considerable human resource, it is desirable to construct the facial expres-

sion models in an unsupervised manner. There exist the unsupervised facial expres-

sion methods on the basis of unsupervised machine learning techniques using such

as principal component analysis [17][18]. Considering that lifelog video databases

can be very large, the facial expression recognition process should be fully efficient.

Although the efficient facial expression recognition and emotional scene detection

methods are proposed [19], the accuracy is not adequate. In this study, we aim to de-

velop an unsupervised emotional scene detection method considering both accuracy

and efficiency.

3Fa ialFeatu s

Prior to the emotional scene detection, the facial expression recognition is per-

formed for each frame image in a video. In order to discriminate the facial expres-

sions, we define several facial features on the basis of the positional relationships of

several salient points on the face (we call them facial feature points ).

3.1

Facial Feature Points

We utilize a total of 59 facial feature points. They are located on the eyebrows (10

points), eyes (22 points), a nose (9 points), a mouth (14 points), and nasolabial

folds (4 points) as shown in Fig.1. The facial feature points are obtained by using a

software application called FaceSDK 4.0 [20]. The facial feature points are denoted

by p 1 ,...,

p 59 .

Search WWH ::

Custom Search

Home