Graphics Reference
In-Depth Information
were used as base classifiers. Linear classifiers are advantageous for
noisy data and to avoid over-fitting, because the decisions are based on
a rather simple function, namely the linear combination of the features.
We obtained the mapping by computing the Moore-Penrose pseudo-
inverse function. MLPs are based on a superposition of multiple
functions (e.g. linear or sigmoid functions), which are represented
by the neurons in the hidden layer (Haykin, 1999). As a result, the
complexity of the MLP can be conveniently adjusted by varying the
number of hidden neurons.
The SVM is a supervised learning method following the maximum
margin paradigm. The classical implementation of the SVM is a typical
representative of a kernel method, and therefore the so-called kernel
trick can be applied. The kernel trick conducts a mapping to a new
feature space that allows classification using non-linear hyper-planes.
Within our study, we used the Gaussian radial basis function (RBF)
kernel, which transforms the input data into the Hilbert space of
infinite dimensions and is calibrated by a width parameter. However,
due to noise or incorrect annotations, it is convenient to have a non-
rigid hyper-plane, being less sensitive to outliers in the training.
Therefore, an extension to the SVM introduces a so-called slack term
that tolerates the amount of misclassified data using the control
parameter. A probabilistic classification output can be obtained using
the method proposed in Platt (1999). Detailed information of these
algorithms can be found for instance in Bishop (2006).
Furthermore, Markov models such as the hidden Markov model
(HMM) have proven to be a suited method for emotion algorithms
(Glodek et al., 2011). The HMM is a stochastic model, applied for
temporal/sequential pattern recognition, e.g. speech recognition and
recognition of gestures. It is composed of two random processes, a
Markov chain with hidden states for the transitions through the states
and a second random process modeling the observations. The transition
probabilities and also the emission probabilities for the outputs are
estimated using the Baum-Welch algorithm. Given the parameters of an
HMM and an observed output sequence, the most likely state sequence
can be computed (Viterbi algorithm) and the posteriori probability for
the observation sequence can be estimated (forward algorithm). This
probability can be utilized to classify sequences, by choosing the most
likely model (Rabiner, 1989).
3. Experiments
In the following section, several of the aspects presented in the previous
sections are evaluated using non-acted emotional data sets.
Search WWH ::




Custom Search