Information Technology Reference
In-Depth Information
14.1.1.1 Loss Functions
p
is an input vector with p features and y i is a response of x i . We consider loss functions
f
x i
y i
n
i
1 , where x i
Suppose that we have a data set composed of n examples
{ (
,
) }
∈ R
=
( ʲ 0 )
that involve generalized linear models. Popular examples include:
Ordinary least squares . This loss function is used for fitting a linear model to
real-valued responses y i
∈ R
, i
=
1
,
2
,...,
n :
n
1
2
y i
T x i
2
f
( ʲ 0 ) =
1 (
ʲ
ʲ 0 )
.
(14.2)
i
=
and x i has been represented as
Here the inner product between two vectors
ʲ
= p
j
T x i
1 ʲ j x j , where
ʲ
ʲ j is the j th element of
ʲ
.
=
Logistic regression . The loss function is for classifying binary responses y i
{−
1
, +
1
}
:
n
log 1
exp
y i
T x i
f
( ʲ 0 ) =
+
( ʲ
+ ʲ 0 )
.
i
=
1
Cox regression . This loss function is for the case where responses are given by
y i
t i
e i
: t i
∈ R + is the the survival time of the i th patient (whose genetic
profile is given by a vector x i ) and e i
= (
,
)
is an indicator variable ( e i
∈{
0
,
1
}
=
1if
the i th patient had an event, e i
=
0 otherwise). It is defined by
T x i
( ʲ
)
exp
( ʲ 0 ) =−
) ,
f
log
j R i exp
T x j
( ʲ
i
E
where E is an index set of all patients who have events, and R i is an index set of
patients who are at risk at the time t i . This is the negative partial log-likelihood
function due to the proportional hazard model proposed by [ 6 ], which typically
appears in survival analysis.
A common property of the three loss functions above is the convexity of f in its
both arguments,
p and
p
ʲ ∈ R
ʲ 0
∈ R
. A function f
( ʲ )
is convex in
R
if for any
ʱ ∈[
0
,
1
]
,
ʱ) ʲ ) ʱ
( ʲ ), ʲ , ʲ ∈ R
p
f
ʲ + (
1
f
( ʲ ) + (
1
ʱ)
f
.
This is a desirable property, together with the convexity of the regularizer
ʨ
, since
it facilitates finding a minimizer of ( 14.1 ).
In our discussion, we only require that f is convex and continuously differen-
tiable, except that in some derivations we use the f of the ordinary least squares
because it leads to the simplest derivation.
 
Search WWH ::




Custom Search