Ensembles of Least Squares Classifiers with Randomized Kernels - Data Mining: Foundations and Practice

Databases Reference

In-Depth Information

•

Hinge loss (implicitly introduced by Vapnik) in binary SVM classification:

L ( f ( x ) ,y )=(1

−

yf ( x )) +

•

Binomial deviance: L ( f ( x ) ,y ) = log(1 + exp(

−

2 yf ( x )))

yf ( x )) 2

Given a loss function, the goal of learning is to find an approximation

function f ( x ) that minimizes the expected risk, or the generalization error

E P ( x,y ) L ( f ( x ) ,y ) (1)

where P(x,y) is the unknown joint distribution of future observations (x,y).

Given a finite sample from the (X,Y) domain this problem is ill-posed.

The regularization approach championed by Poggio and rooted in Tikhonov

regularization theory [17] restores well-posedness (existence, uniqueness, and

stability) by restricting the hypothesis space, the functional space of possible

solutions:

•

Squared error: L ( f ( x ) ,y )=(1

−

m

1

m

f = argmin

f∈H

2

K

L ( f ( x i ) ,y i )+ γ

f

(2)

i =1

The hypothesis space H here is a Reproducing Kernel Hilbert Space (RKHS)

defined by kernel K ,and γ is a positive regularization parameter.

The mathematical foundations for this framework as well as a key algo-

rithm to solve (2) are derived elegantly by Poggio and Smale [14] for the

quadratic loss function. The algorithm can be summarized as follows:

1. Start with the data ( x i ,y i ) i =1 .

2. Choose a symmetric , positive definite kernel, such as

x ||

2

− ||

x

−

K ( x,x )=exp(

) .

(3)

2 σ 2

3. Set

m

f ( x )=

c i K ( x i ,x ) ,

(4)

i =1

where c is a solution to

( mγ I + K ) c = y ,

(5)

which represents well-posed ridge regression model [12].

The generalization ability of this solution, as well choosing the regulariza-

tion parameter γ were studied in [6, 7]. Thus, using the square loss function

with regularization leads to solving a simple well defined linear problem. This

is the core of RLSC. The solution is a linear kernel expansion of the same form

as the one given by support vector machines (SVM). Note also that the SVM

formulation naturally fits in the regularization framework (2). Inserting the

SVM hinge loss function L ( f ( x ) ,y )=(1

yf ( x )) + to (2) leads to a solution

that is sparse in coe cients c , but it introduces the cost of having to solve a

quadratic optimization problem instead of the linear solution of the RLSC.

RLSC with the square loss function, which is more common for regression,

has also proven to be very effective in binary classification problems [15, 16].

−

Data Mining: Foundations and Practice

Search WWH ::

Custom Search

Home