Information Technology Reference
In-Depth Information
14.1.1.1 Loss Functions
p
is an input vector with
p
features and
y
i
is a response of
x
i
. We consider loss functions
f
x
i
y
i
n
i
1
, where
x
i
Suppose that we have a data set composed of
n
examples
{
(
,
)
}
∈ R
=
(
ʲ
,ʲ
0
)
that involve generalized linear models. Popular examples include:
•
Ordinary least squares
. This loss function is used for fitting a linear model to
real-valued responses
y
i
∈ R
,
i
=
1
,
2
,...,
n
:
n
1
2
y
i
T
x
i
2
f
(
ʲ
,ʲ
0
)
=
1
(
−
ʲ
−
ʲ
0
)
.
(14.2)
i
=
and
x
i
has been represented as
Here the inner product between two vectors
ʲ
=
p
j
T
x
i
1
ʲ
j
x
j
, where
ʲ
ʲ
j
is the
j
th element of
ʲ
.
=
Logistic regression
. The loss function is for classifying binary responses
y
i
•
∈
{−
1
,
+
1
}
:
n
log
1
exp
y
i
T
x
i
f
(
ʲ
,ʲ
0
)
=
+
−
(
ʲ
+
ʲ
0
)
.
i
=
1
•
Cox regression
. This loss function is for the case where responses are given by
y
i
t
i
e
i
:
t
i
∈ R
+
is the the survival time of the
i
th patient (whose genetic
profile is given by a vector
x
i
) and
e
i
=
(
,
)
is an indicator variable (
e
i
∈{
0
,
1
}
=
1if
the
i
th patient had an event,
e
i
=
0 otherwise). It is defined by
T
x
i
(
ʲ
)
exp
(
ʲ
,ʲ
0
)
=−
)
,
f
log
j
∈
R
i
exp
T
x
j
(
ʲ
i
∈
E
where
E
is an index set of all patients who have events, and
R
i
is an index set of
patients who are at risk at the time
t
i
. This is the negative partial log-likelihood
function due to the proportional hazard model proposed by [
6
], which typically
appears in survival analysis.
A common property of the three loss functions above is the convexity of
f
in its
both arguments,
p
and
p
ʲ
∈ R
ʲ
0
∈ R
. A function
f
(
ʲ
)
is
convex
in
R
if for any
ʱ
∈[
0
,
1
]
,
−
ʱ)
ʲ
)
≤
ʱ
(
ʲ
),
∀
ʲ
,
ʲ
∈ R
p
f
(ʱ
ʲ
+
(
1
f
(
ʲ
)
+
(
1
−
ʱ)
f
.
This is a desirable property, together with the convexity of the regularizer
ʨ
, since
it facilitates finding a minimizer of (
14.1
).
In our discussion, we only require that
f
is convex and continuously differen-
tiable, except that in some derivations we use the
f
of the ordinary least squares
because it leads to the simplest derivation.
Search WWH ::
Custom Search