Information Technology Reference
In-Depth Information
4.2 Machine Learning: Prologue on Support Vector
Regression
Support Vector Machines (SVM), originally used for classi
cation purposes, can
also be applied to regression problems by introducing an alternative loss function as
already stated in the previous section. Support Vector Regression (SVR) maps the
input data x into a higher dimensional feature space F by nonlinear mapping and then
a linear regression problem is obtained and solved in this feature space. The goal is to
from the actually obtained output
y for all the training data, that is, all errors less than are accepted but not more than
that. Given a set of N training data
find a function f(x) that has a maximum deviation
R n
N, where x i
denotes the input vector of dimension n, yi i is the corresponding target value, and n is
the total number of data patterns. The linear regression function is:
ð
x i ;
y i Þj
x i 2
;
y i 2
R
;
i
¼
1
;
2
; ...;
R n and b
f
ð
x
Þ¼h
w
;
x
b
;
w
2
2
R
:
ð 3 Þ
Here, w is the weight vector and b is the bias term. To estimate the value of w
and b for the selection of the best hype plane, we need to minimize the following
regularized risk function:
c X
n
1
2 jj
2
R
¼
w
jj
þ
L ð
y i ;
f
ð
x i ÞÞ
ð 4 Þ
i ¼ 1
where, the
first term is the regularized term which represents the ability of pre-
diction for regression, and the second term is the empirical error or risk, wherein the
constant C > 0 determines the trade-off between the training errors and the model
complexity. The
-loss insensitive function, present in the second term of the risk
function, is de
ned as:
0
;
j
y
f
ð
x
Þj
a
L ð y ; f ð x ÞÞ ¼
y f ð x Þ;
j y f ð x Þj [ a
This function gives the loss incurred by predicting f(x) instead of y. Now, we
introduce two slack variables i and i* in the above regression estimation problem to
transform it into an equivalent constrained optimization problem. The loss function
and the slack variables allow the presence of noisy data; here noisy data refers to
those data points which lie outside the
ε
tube, i is the positive difference between the observed value and ε , and if the
observed point
-tube. If the observed point is above the
i
is below the
-tube,
is the negative difference between the
observed value and
. Hence, the constrained optimization problem formed amounts
to minimizing the following equation:
Search WWH ::




Custom Search