Signature Selection for Grouped Features with a Case Study on Exon Microarrays - Feature Selection for Data and Pattern Recognition

Information Technology Reference

In-Depth Information

14.1.1.1 Loss Functions

is an input vector with p features and y i is a response of x i . We consider loss functions

x i

y i

1 , where x i

Suppose that we have a data set composed of n examples

{ (

) }

∈ R

( ʲ ,ʲ 0 )

that involve generalized linear models. Popular examples include:

•

Ordinary least squares . This loss function is used for fitting a linear model to

real-valued responses y i

∈ R

, i

,...,

n :

y i

T x i

( ʲ ,ʲ 0 ) =

1 (

− ʲ

− ʲ 0 )

(14.2)

and x i has been represented as

Here the inner product between two vectors

= p

T x i

1 ʲ j x j , where

ʲ j is the j th element of

Logistic regression . The loss function is for classifying binary responses y i

•

∈

{−

, +

}

log 1

exp

y i

T x i

( ʲ ,ʲ 0 ) =

−

( ʲ

+ ʲ 0 )

•

Cox regression . This loss function is for the case where responses are given by

y i

t i

e i

: t i

∈ R + is the the survival time of the i th patient (whose genetic

profile is given by a vector x i ) and e i

= (

)

is an indicator variable ( e i

∈{

}

1if

the i th patient had an event, e i

0 otherwise). It is defined by

T x i

( ʲ

)

exp

( ʲ ,ʲ 0 ) =−

) ,

log

j ∈ R i exp

T x j

( ʲ

∈

where E is an index set of all patients who have events, and R i is an index set of

patients who are at risk at the time t i . This is the negative partial log-likelihood

function due to the proportional hazard model proposed by [ 6 ], which typically

appears in survival analysis.

A common property of the three loss functions above is the convexity of f in its

both arguments,

p and

ʲ ∈ R

ʲ 0

∈ R

. A function f

( ʲ )

is convex in

if for any

ʱ ∈[

]

− ʱ) ʲ ) ≤ ʱ

( ʲ ), ∀ ʲ , ʲ ∈ R

(ʱ ʲ + (

( ʲ ) + (

− ʱ)

This is a desirable property, together with the convexity of the regularizer

, since

it facilitates finding a minimizer of ( 14.1 ).

In our discussion, we only require that f is convex and continuously differen-

tiable, except that in some derivations we use the f of the ordinary least squares

because it leads to the simplest derivation.

Feature Selection for Data and Pattern Recognition

Search WWH ::

Custom Search

Home