Dynamic Facial Expression Recognition Using Boosted Component-Based Spatiotemporal Features and Multi-classifier Fusion - Advanced Concepts for Intelligent Vision Systems

Information Technology Reference

In-Depth Information

For exploiting the complementary information among all classifiers, we investigated

three decision rules (mean rule, product rule, and median rule). Detailed derivation of

decision rules by Eqn. 5 and Bayesian theorem can be found e.g. in [19]. Assume that

all classifiers used are generally statistically independent, and the priori probability of

occurrence for i -th class model are under assumption of equal priors, the rule of multi-

classifier fusion is simplified to

Assign X

→{

ω t =

}

(

ω t =

X, D k )= max

i∈{ 1 ,...,C}

(

ω t =

X, D k )

DecisionRule

k∈{ 1 ,...,R}

(6)

As shown in Fig. 3, many popular classifiers, such as SVM, can output a voting vec-

tor which represents the voting numbers for each class. We denote V i

, for the voting

number of i -th class from k -th classifier D k .

These voting numbers are then converted to probabilities by applying the softmax

function

(

V i )

exp

P i =

(

ω t =

X, D k )=

(7)

i =1

V i )

exp

(

Using this transformation does not change the classification decision for a classifier;

moreover, it allows us to treat the classifier within Bayesian probabilistic framework.

Experiments

The proposed approach was evaluated with the Cohn-Kanade facial expression database.

In our experiments, 374 sequences were selected from the database for basic expres-

sions recognition. The sequences came from 97 subjects, with one to six expressions

per subject.

Coordinates of facial fiducial points in the first frame are determined by ASM, and

then the CSF features extracted from 38 facial components with fixed block size on

those points are concatenated into one histogram. Ten-fold cross validation method was

used in the whole scenario.

It was anticipated that the component size will influence the performance. Fig. 4

presents results using four block sizes with CSF. From this figure we can observe that

the highest mean performance (

92%

) is reached when the component size is 16

16,

which was then selected for the following experiments.

AdaBoost is used to select the most important slices, as described in Sec. 2.3. In

our experiments, the number of slices varies at 15, 30, 45, 60, 75, 90. The average

recognition accuracies corresponding to different number of slices are

37%

98%

12%

32%

05%

25%

, respectively. It is observed that the best accuracy

is obtained with 45 slices. Compared with the result in Fig. 4 at optimal

block size, the accuracy decreases by

12%

, but the dimensionality of the feature space

is reduced from 38*59*3 (6726) to 45*59 (2655).

The six-expression classification problem was decomposed into 15 two-class prob-

lems. Therefore, each test sample is classified by 15 expression-pair sub-classifiers. In

multi-classifier fusion, 15 sub-classifiers as a whole were thought as an individual clas-

sifier D k as shown in Fig. 3. After selecting the optimal component size, five different

Advanced Concepts for Intelligent Vision Systems

Search WWH ::

Custom Search

Home