Information Technology Reference
In-Depth Information
out as follows: a) generate a large number of samples for each class-conditional
input PDF; b) compute the output f E|t ( e ) PDFs with the KDE method; c)
compute H S and H R 2 using formulas (C.3) and (C.5) (entropy partitioning)
and performing numerical integration. Although laborious, this procedure
is able to produce accurate estimates of the theoretical EEs if one uses a
“large number of samples” in step (a), that is a value of n in formula (3.2)
guaranteeing a very low integrated mean square error (say, IMSE < 0 . 01)
of f E|t ( e ) when computed with the optimal h
h ( n ) (see Appendix E). In
these conditions f E|t ( e ) is very close to f E|t ( e ).
The main differences between empirical and theoretical MEE are as follows:
Whereas theoretical MEE implies the separate evaluation of f E|t ( e ),the
empirical MEE relies on the estimate of the whole f E ( e )
f n ( e ) based on
the n -sized dataset.
One cannot apply iterative optimization algorithms to theoretical EEs (at
each training step the f E|t ( e ) are not easily computable); one may, how-
ever, compute the theoretical EE in a neighborhood of a parameter vector,
as we will do later.
Whereas the kernel smoothing effect (see Appendix E) is neglectable when
using the optimal h ( n ) in the computation of theoretical EEs, its influence
will be of importance, as we shall see, in empirical MEE.
We saw in Sect. 2.2.1 that the min P e =0case for a classifier with interval
codomain corresponds to an error PDF with a null area in a subset of the
codomain. For instance, for the [
1 , 1] codomain and the usual thresholding
function assigning the
1 label to negative outputs and the 1 label to positive
outputs, the classifier has P e =0if the error variable is zero in [
[1 , 2]. In this case the task of the training algorithm is to squeeze f E ( e ) (or
more rigorously,
2 ,
1]
f n ( e ) for an arbitrarily large n ) driving it inside [
1 , 1].
Classifiers trained to reach the minimum error entropy should ideally, for
infinitely separated classes, achieve more than that: f E ( e ) should be driven
towards a single Dirac- δ at the origin (see 2.3.1). For other codomain intervals
the same result for MEE trained classifiers should ideally be obtained. One
question that arises in practical terms is, when we have all n error samples
with equal value, e 1 = ... = e n , and use the empirical EEs (3.4) and (3.5)
instead of the theoretical EEs, are they still a minimum? In other words,
do empirical EEs preserve the minimum property of theoretical EEs? The
answer is armative: the condition e 1 = ... = e n =0is a minimum, and in
fact a global minimum, of the empirical Shannon EE [212].
 
Search WWH ::




Custom Search