Digital Signal Processing Reference
In-Depth Information
Fig. 7.2 Recursive call of the ID3 algorithm for a feature n that maximises the IG(R) with respect
to the classification of L within M [ 1 ]
that this reduces the number of features, i.e., an inherent feature selection by IG(R)
is given. DTs are able to handle missing features both in training and recognition.
Further, if both, the feature set and the training set are randomly sub-sampled for
construction of an ensemble (cf. Sect. 7.4 ) of DTs, one speaks of Random Forests
(RF), which are known as competitive classifier [ 5 ], e.g., to the further introduced
classifiers.
7.2.2 Support Vectors
Support Vector Machines (SVM) and Support Vector Regression (SVR) were intro-
duced in [ 6 ]. In principle, they base on statistical learning theory, and their theo-
retic foundation can be interpreted as analogon to electrostatics: Thereby, a training
instance corresponds to a charged conductor at a given place in space, the decision
function corresponds to an electrostatic potential function and the learning target
function to Coulomb's energy [ 7 ].
The concept of SVM and SVR unites several theories of machine learning and
optimisation: At first, a linear classifier or regressor—similar to a perceptron with
linear activation function —is combined with a non-linear mapping into a higher
dimensional decision space in order to be able to solve more complex decision tasks.
The linear classifier is thereby built based on a subset of the learning instances—the so
called 'support vectors'. By that, the danger of overfitting to the learning instances
as a whole is limited. The choice of support vectors is achieved by a quadratic
optimisation problem.
7.2.2.1 Support Vector Machines
In general, SVM are by that capable to discriminate between two classes, i.e., solve
binary decision problems. We will at first focus on this task—the solving of multiple
class problems can then be reached by diverse strategies such as one-versus-one pair-
wise decisions, one-versus-all decisions, or binary-tree-based grouping of decisions.
SVM are trained based on a set
of L learning instances, where each of the
instances is accordingly labelled with its class. For l
L
=
1
,...,
L the assignment
of a pattern instance x l
to its class is denoted by y l
∈{−
1
, +
1
}
. By definition the
 
Search WWH ::




Custom Search