Information Technology Reference
In-Depth Information
2 . 84, than the latter, 1 +ln( 2) = 2 . 41. In spite of these diculties, one can
still present three properties of H S that are useful in guiding the search of
experimental settings and in the interpretation of experimental results:
1. H S ( E ) is invariant to translations and partitions of f E ( e ). This property,
a direct consequence of the well-known entropy invariance to translations,
and of a result on density partitions presented in Appendix C, is illus-
trated in Fig. 2.6. This is a property that may lead MEE to perform worse
than MMSE or MCE in cases where an equal probability of error may
correspond to distinct configurations.
2. For a large number of PDF families H S increases with the variance. This
property (illustrated in Fig. 2.6) is analyzed in Appendix D and is asso-
ciated with the common idea that within the same density family ,“longer
tails” (resulting in larger variance) imply larger entropy. Although there
are exceptions to this rule, one can quite safely use it in the case of f E ( e )
densities. As a consequence, one can say that MEE favors the “order” of
the errors (large H S meaning “disorderly” errors, i.e., not concentrated at
one location).
3. Whenever f E ( e ) has two components of equal functional form and equal
priors, for a large number of f E ( e ) families H S ( E ) will decrease when the
smaller (or equal) variance component decreases while the variance of the
other component increases by the same amount (keeping the functional
form unchanged). This property (illustrated in Fig. 2.6) is a consequence
of the fact that entropy is an up-saturating function of the variance for a
large number of PDF families (see Appendix D where the definition of up-
saturating function is also presented); therefore, when the larger variance
component dilates the corresponding increase in entropy is outweighed by
the decrease in entropy of the other component. As a consequence of this
property MEE is more tolerant than MMSE or MCE to tenuous tails or
outlier errors, as exemplified in the following section.
Example 2.4. We now compute the Shannon entropy of error for the dataset of
Example 2.3. Since f E ( e )= u ( e ;
d, d ) and the Shannon entropy of u ( x ; a, b )
is ln( b
a ),wehave
H S ( E )=ln(2 d )= 1
2 ln(12 σ 2 ) .
The entropy H S ( E ) is an up-saturating function (see definition in the Ap-
pendix D) of the variance σ 2 .Moreover, H S ( E ) penalizes infinitely less the
maximum separation of the classes ( f E ( e )= δ ( e ) for d =0)relatively to the
total overlap ( f E ( e )= u ( e ;
1 , 1)).
The error PDF family of Examples 2.3 and 2.4, parameterized by d ,was
such that both MSE and SEE risks attained the minimum value for d =0.
Both methods agreed on which PDF was most concentrated. One may
Search WWH ::




Custom Search