Graphics Reference
In-Depth Information
classifiers to be as accurate as possible is obvious, diversity roughly
means that classifiers should not agree on misclassified data. In our
studies, various modalities and feature views have been utilized on
the data to achieve such a set of diverse and accurate classifiers.
The rest of this chapter is organized as follows: In Section 2,
we will present the latest approaches to improve the recognition of
emotion. Section 3 describes real-world data collections for affective
computing. Furthermore, adequate features are described together
with a numerical evaluation. Finally, Section 4 concludes.
2. Multi-modal Classification Architectures and
Information Fusion for Emotion Recognition
2.1 Learning from multiple sources
For many benchmark data collections in the field of machine learning,
it is sufficient to process one type of feature that is extracted from
a single representation of the data (e.g. visual digit recognition).
However, often in many real-world applications, different independent
sensors are available (e.g. microphone and camera) and it is necessary
to combine these channels to obtain a good recognition performance
and to achieve a robust architecture against sensor failure.
To create a classifier system, which is able to handle different
sources of information, three widely used approaches have been
proposed and evaluated in the literature, namely early fusion , mid-
level fusion and late fusion (Dietrich et al., 2003). Using early fusion,
the information is combined on the earliest level by concatenating
the individual features to a higher dimensional vector, as depicted
on the left-hand side of Figure 1. The converse strategy is to combine
the independent streams as late as possible, which is called late
fusion or multiple classifier system (MCS), see the right-hand side of
Figure 1. The third approach, which recently gains more attention,
is known as mid-level fusion (Scherer et al., 2012; Eyben et al.,
2012; Glodek et al., 2012; Dietrich et al., 2003) and combines
the channels in an intermediate abstraction level, as for example
conducted in a combined hidden layer of an artificial neural network.
The corresponding classifier architecture is shown in the middle of
Figure 1.
The selection of an optimal architecture is strongly related to the
respective problem. An important clue for choosing the appropriate
architecture could be drawn by judging the dependency and
Search WWH ::




Custom Search