Information Technology Reference
In-Depth Information
n
c
R =
f (
|
t ik
y ik |
) .
(2.17)
i =1
k =1
With these conditions, [89] shows that for outputs y k
[0 , 1] ,f must asymp-
totically (in the limit of infinite data) satisfy
f (1
y )
f ( y )
1
y
=
,
(2.18)
y
implying
f ( y )= y r (1 − y ) r− 1 dy .
(2.19)
For r = 1 the square-error function is obtained. For r =0one obtains f ( y )=
) leading to the CE risk (see also [26]).
Again, this favorable scenario implying that a classifier using MSE or CE
would be able to attain Bayes performance provided it had a suciently com-
plex architecture, is never verified in practice for several reasons (explained
in the previous section), the most obvious one being that target components
are not independent.
ln(1
−|
y
|
2.2 Risk Functionals Reappraised
Risk functionals are usually presented and analyzed in the literature as ex-
pectations of loss functions relative to joint distributions in the X
T space.
We may, however, express risk functionals and analyze their properties rela-
tive to other spaces, functionally dependent on X and T . In fact, in order to
appreciate how the various risk functionals cope with the classifier problem, it
is obviously advantageous to express them in terms of the error r.v. [150,219].
We now proceed to do exactly this for the MSE and CE risk functionals. For
simplicity, we will restrict the analysis to two-class problems.
×
2.2.1 The Error Distribution
Assuming w.l.o.g. a
coding of the targets, we first derive the cu-
mulative distribution function of the error 2
{−
1 , 1
}
r.v., E = T
Y ,denotingby
p = P ( T =1)and q =1
p = P ( T =
1) the class priors, as follows:
2 The “error” r.v. is indeed a deviation r.v., not the misclassification rate r.v.
 
Search WWH ::




Custom Search