Information Technology Reference
In-Depth Information
R =
f (
t ik
y ik |
) .
i =1
k =1
With these conditions, [89] shows that for outputs y k
[0 , 1] ,f must asymp-
totically (in the limit of infinite data) satisfy
f (1
y )
f ( y )
f ( y )= y r (1 − y ) r− 1 dy .
For r = 1 the square-error function is obtained. For r =0one obtains f ( y )=
) leading to the CE risk (see also [26]).
Again, this favorable scenario implying that a classifier using MSE or CE
would be able to attain Bayes performance provided it had a suciently com-
plex architecture, is never verified in practice for several reasons (explained
in the previous section), the most obvious one being that target components
are not independent.
2.2 Risk Functionals Reappraised
Risk functionals are usually presented and analyzed in the literature as ex-
pectations of loss functions relative to joint distributions in the X
T space.
We may, however, express risk functionals and analyze their properties rela-
tive to other spaces, functionally dependent on X and T . In fact, in order to
appreciate how the various risk functionals cope with the classifier problem, it
is obviously advantageous to express them in terms of the error r.v. [150,219].
We now proceed to do exactly this for the MSE and CE risk functionals. For
simplicity, we will restrict the analysis to two-class problems.
2.2.1 The Error Distribution
Assuming w.l.o.g. a
coding of the targets, we first derive the cu-
mulative distribution function of the error 2
1 , 1
r.v., E = T
Y ,denotingby
p = P ( T =1)and q =1
p = P ( T =
1) the class priors, as follows:
2 The “error” r.v. is indeed a deviation r.v., not the misclassification rate r.v.
Search WWH ::

Custom Search