Information Technology Reference
In-Depth Information
P
(
t
)
1
−
1
R
L
(
Y
)=
t∈T
E
Y |t
[
L
(
t, Y
)] =
t∈{−
1
,
1
}
P
(
t
)
L
(
t, y
)
f
Y |t
(
y
)
dy ,
(2.25)
if the absolute integrability condition (2.23) for
L
(
t, y
) is satisfied.
Applying again Theorem 2.1 to
E
=
T
Y
, the risk functional (2.25) is
finally expressed in terms of the error variable as
−
P
(
t
)
t
+1
t−
1
R
L
(
E
)=
t∈{−
1
,
1
}
L
(
t, e
)
f
E|t
(
e
)
de .
(2.26)
y
)
2
=
e
2
For MSE,
L
SE
(
t, e
)=(
t
−
depends only on
e
(or, in more detail,
E
T,E
[
E
2
], the second order moment
of the error, which is empirically estimated as in (2.5) and can be rewritten
as
e
w
=
t
−
y
w
). We then have
R
MSE
(
E
)=
y
i
)
2
.
y
i
)
2
+
t
i
=1
R
MSE
(
Y
)=
1
n
(
t
i
−
(
t
i
−
(2.27)
t
i
=
−
1
Let us now consider the cross-entropy risk whose empirical estimate is given
by formula (2.16). For a two-class problem and the
-coding scheme one
obtains the following popularized expression, when the classifier has a single
output:
{
0
,
1
}
R
CE
(
Y
)=
−
(1
−
t
i
)ln(1
−
y
i
)
−
t
i
ln(
y
i
)
.
(2.28)
t
i
=0
t
i
=1
The
{−
1
,
1
}
-coding implies a
y
→
(
y
+1)
/
2 transformation; formula (2.28)
is then rewritten as
ln
1
ln
1+
y
i
2
.
−
y
i
R
CE
(
Y
)=
−
−
(2.29)
2
t
i
=
−
1
t
i
=1
When multiplied by
n
,
R
CE
(
Y
) can be viewed as the empirical estimate of
the following (theoretical) risk functional:
R
CE
(
Y
)=
− P
(
−
1)
1
−
ln(1
− y
)
f
Y |−
1
(
y
)
dy−
1
P
(1)
1
−
1
−
ln(1 +
y
)
f
Y |
1
(
y
)
dy
+ln(2)
.
(2.30)
Applying the same variable transformation as we did before, the CE risk
functional is finally expressed in terms of the error variable as
P
(
t
)
t
+1
t−
1
ln
1
2
f
E|t
(
e
)
de
+ln(2)
.
R
CE
(
E
)=
t∈{−
1
,
1
}
(2.31)
−
te