Information Technology Reference
In-Depth Information
The appearance features are based on the skin texture of a face and can describe
the appearance changes of a face such as wrinkles and furrows. The appearance
features can be obtained from the intensity distributions of the pixels in a facial
image. For instance, Gabor wavelet [10] and the local binary patterns [11] are widely
used as this type of features.
The geometric features describe the shape and locations of several facial compo-
nents such as eyebrows, eyes, and a mouth. For example, 3D models of the faces are
used to accurately describe and recognize facial expressions [12][13]. These models
can properly describe the facial structures and will be effective for accurate facial
expression recognition. In the lifelog video retrieval, however, it will be difficult
to prepare 3D facial features within reasonable cost. By using several salient facial
feature points (e.g., the end points of the mouth and a center point of the eyes),
the facial features can be more concise. The facial features are defined as the po-
sitional relationship of the facial feature points such as the distance between two
points and the angle between two line segments formed by connecting three points
[14][15][16]. In this study, we adopt the geometric features represented by the posi-
tional relationships of a few facial feature points because of the conciseness and the
better understandability of the facial features.
Most of the facial expression recognition methods are supervised while the su-
pervised learning needs sufficient training data. Because preparing the training data
requires considerable human resource, it is desirable to construct the facial expres-
sion models in an unsupervised manner. There exist the unsupervised facial expres-
sion methods on the basis of unsupervised machine learning techniques using such
as principal component analysis [17][18]. Considering that lifelog video databases
can be very large, the facial expression recognition process should be fully efficient.
Although the efficient facial expression recognition and emotional scene detection
methods are proposed [19], the accuracy is not adequate. In this study, we aim to de-
velop an unsupervised emotional scene detection method considering both accuracy
and efficiency.
3Fa ialFeatu s
Prior to the emotional scene detection, the facial expression recognition is per-
formed for each frame image in a video. In order to discriminate the facial expres-
sions, we define several facial features on the basis of the positional relationships of
several salient points on the face (we call them facial feature points ).
3.1
Facial Feature Points
We utilize a total of 59 facial feature points. They are located on the eyebrows (10
points), eyes (22 points), a nose (9 points), a mouth (14 points), and nasolabial
folds (4 points) as shown in Fig.1. The facial feature points are obtained by using a
software application called FaceSDK 4.0 [20]. The facial feature points are denoted
by p 1 ,...,
p 59 .
 
Search WWH ::




Custom Search