Information Technology Reference
In-Depth Information
methods are target projection (TP), orthogonal PLS, etc. (Rajalahti and
Kvalheim, 2011).
Support Vector Machines
Support Vector Machines (SVM) are a group of supervised learning
algorithms, which can be used for classifi cation or regression purposes. The
SVM algorithm is based upon the statistical learning theory and the
Vapnik-Chervonenkis (VC) dimensions (Vapnik and Chervonenkis, 1974).
Standard SVM is a binary classifi er that separates inputs into two possible
outputs (classes). In contrast to previously described FA methods, where
dimensionality reduction enables fi nding of LVs, SVM algorithms are used
to defi ne a space of higher (even infi nite) dimensions - a hyperplane. For
classifi cation purposes, good separation can be achieved once the distance
between samples (in the hyperplane) belonging to different classes is large.
Samples that were not separable in the previous space may then be
distinguished in the newly created hyperplane (Roggo et al., 2010).
Construction of the higher dimensional space by SVM is based upon
defi nition of a kernel function K ( x,y ), which is applied on the data in the
original space (Press et al., 2007). Kernel functions normally used are linear,
polynomial, radial basis function (RBF), and sigmoidal, where the latter
makes the SVM algorithm equivalent to a two-layer perceptron neural
network (Section 5.1.2.1). RBF is the most often used kernel function, since
it can handle cases when the relation between the class labels (the target
values) and the attributes (the features of the training set) is nonlinear:
K ( x i x j ) = exp(−γ || x i x j || 2 )
[4.18]
with y being a parameter that controls the width of the kernel function,
and x i and x j are the vectors of the i th and the j th training samples,
respectively.
SVMs are similar to neural networks, with the main difference being
the way in which the weights are adjusted during training. In SVMs,
weights are adjusted by solving a quadratic programming problem with
linear constraints. Independent (predictor) variables are denoted as
attributes, whereas the transformed attribute that is used to defi ne the
hyperplane is called a feature. The task of choosing the most suitable
representation is known as feature selection. A set of features that
describes one sample (i.e. a row of independent, predictor values) is
called a vector. Therefore, the goal of the SVM algorithm is to fi nd the
optimal hyperplane that separates clusters of vector in such a way that
cases with one category of the target variable are on one side of the plane
￿
￿
￿
 
Search WWH ::




Custom Search