Information Technology Reference
In-Depth Information
The bottom row of Fig. 3.27 illustrates the unequal class error probabilities
case (due to increased ω 1 support) with optimal split point w 0 =1.Both
the theoretical and empirical curves fail to find w 0 . However, the empirical
minimum occurs in a close neighborhood of w 0 .
For Gaussian class-conditional distributions of the perceptron input there
are no closed-form expressions of the entropies. Using numerical computation
of the integrals, similar conclusions can be drawn (theoretical entropy max-
imum at the min P e point; empirical entropy minimum close to the min P e
point).
3.6 Kernel Smoothing Revisited
The present section analyzes in greater detail the influence of kernel smooth-
ing in the attainment of MEE solutions for continuous error classifiers. The
analysis follows the exposition given in [212].
Consider the data splitter with Gaussian inputs as in the previous section.
Figure 3.28 illustrates the influence of kernel smoothing in the error PDF
estimation. The figure shows the theoretical and empirical PDFs for two
locations of the split point: off-optimal (3.28a) and optimal (3.28b). Note the
smoothing imposed by the kernel estimate: an increased h implies a smoother
estimate with greater impact near the origin.
The bottom graphs of Fig. 3.28a and 3.28b are illustrative of why a theo-
retical maximum can change to an empirical minimum. The error PDF in (a)
is almost uniform for class ω 1 , implying a high value of H S|− 1 ; however, the
error PDF for class ω 1 is highly concentrated implying a very low H S| 1 ; f E| 1
is clearly more concentrated than its left counterpart. Property 3 (Sect. 2.3.4)
and formula (C.3) give then the plausible justification of why the overall H S
turns out to be smaller for the off-optimal than for the optimal split point.
With the error PDF estimated with a suciently high value of h we get
the sort of curves showed with dotted line at the bottom graphs of (a) and
(b). Kernel smoothing “couples” the class-conditional components of the error
PDF, which is then seen as a “whole”, ignoring relation (C.3); the density for
the nonoptimal split has now a long tail, whereas the density of the optimal
split is clearly more concentrated at the origin. As a consequence a minimum
of the entropy is obtained at the optimal split point. A similar maximum-to-
minimum entropy flip due to kernel smoothing is observed in other classifiers,
namely those discussed in the present chapter.
We now analyze the theoretical behavior of the kernel smoothing effect
on two distinct PDFs that resemble the ones portrayed in Fig. 3.28. One of
them, f 1 ( x ), corresponds to the off-optimal error PDF with a large tail for
one class and a fast-decaying trend for the other class, modeled as
f 1 ( x ; λ )= 1
2 u ( x ; 1 , 0) + 1
2 e + ( x ; λ ) ,
 
Search WWH ::




Custom Search