Multi-Modal Classifier-Fusion for the Recognition of Emotions - Coverbal Synchrony in Human-Machine Interaction

Graphics Reference

In-Depth Information

Figure 10. For the video-based classifi cation, form and motion features are concatenated

and used to train n -SVM for each label dimension. The outputs of the classifi ers are used

to train an intermediate fusion layer realized by MLPs.

multiplication. Figure 11 shows the audiovisual classifier system, while

the results are given in Table 3.

4. Conclusion and Future Work

Classifying the emotion is generally a difficult task when leaping from

overacted data to realistic human-computer interaction. In this study,

the problem was investigated by combining different modalities. The

result of the evaluation shows that the usage of different modalities

Figure 11. Overall architecture of the audiovisual classifier system, the outputs of all

modalities are integrated on word level and used to train a multilayer neural network for

each label dimension .

Search WWH ::

Custom Search

Home