Databases Reference
In-Depth Information
Hinge loss (implicitly introduced by Vapnik) in binary SVM classification:
L ( f ( x ) ,y )=(1
yf ( x )) +
Binomial deviance: L ( f ( x ) ,y ) = log(1 + exp(
2 yf ( x )))
yf ( x )) 2
Given a loss function, the goal of learning is to find an approximation
function f ( x ) that minimizes the expected risk, or the generalization error
E P ( x,y ) L ( f ( x ) ,y ) (1)
where P(x,y) is the unknown joint distribution of future observations (x,y).
Given a finite sample from the (X,Y) domain this problem is ill-posed.
The regularization approach championed by Poggio and rooted in Tikhonov
regularization theory [17] restores well-posedness (existence, uniqueness, and
stability) by restricting the hypothesis space, the functional space of possible
solutions:
Squared error: L ( f ( x ) ,y )=(1
m
1
m
f = argmin
f∈H
2
K
L ( f ( x i ) ,y i )+ γ
f
(2)
i =1
The hypothesis space H here is a Reproducing Kernel Hilbert Space (RKHS)
defined by kernel K ,and γ is a positive regularization parameter.
The mathematical foundations for this framework as well as a key algo-
rithm to solve (2) are derived elegantly by Poggio and Smale [14] for the
quadratic loss function. The algorithm can be summarized as follows:
1. Start with the data ( x i ,y i ) i =1 .
2. Choose a symmetric , positive definite kernel, such as
x ||
2
||
x
K ( x,x )=exp(
) .
(3)
2 σ 2
3. Set
m
f ( x )=
c i K ( x i ,x ) ,
(4)
i =1
where c is a solution to
( I + K ) c = y ,
(5)
which represents well-posed ridge regression model [12].
The generalization ability of this solution, as well choosing the regulariza-
tion parameter γ were studied in [6, 7]. Thus, using the square loss function
with regularization leads to solving a simple well defined linear problem. This
is the core of RLSC. The solution is a linear kernel expansion of the same form
as the one given by support vector machines (SVM). Note also that the SVM
formulation naturally fits in the regularization framework (2). Inserting the
SVM hinge loss function L ( f ( x ) ,y )=(1
yf ( x )) + to (2) leads to a solution
that is sparse in coe cients c , but it introduces the cost of having to solve a
quadratic optimization problem instead of the linear solution of the RLSC.
RLSC with the square loss function, which is more common for regression,
has also proven to be very effective in binary classification problems [15, 16].
 
Search WWH ::




Custom Search