MEE with Continuous Errors - Minimum Error Entropy Classification

Information Technology Reference

In-Depth Information

The bottom row of Fig. 3.27 illustrates the unequal class error probabilities

case (due to increased ω 1 support) with optimal split point w 0 =1.Both

the theoretical and empirical curves fail to find w 0 . However, the empirical

minimum occurs in a close neighborhood of w 0 .

For Gaussian class-conditional distributions of the perceptron input there

are no closed-form expressions of the entropies. Using numerical computation

of the integrals, similar conclusions can be drawn (theoretical entropy max-

imum at the min P e point; empirical entropy minimum close to the min P e

point).

3.6 Kernel Smoothing Revisited

The present section analyzes in greater detail the influence of kernel smooth-

ing in the attainment of MEE solutions for continuous error classifiers. The

analysis follows the exposition given in [212].

Consider the data splitter with Gaussian inputs as in the previous section.

Figure 3.28 illustrates the influence of kernel smoothing in the error PDF

estimation. The figure shows the theoretical and empirical PDFs for two

locations of the split point: off-optimal (3.28a) and optimal (3.28b). Note the

smoothing imposed by the kernel estimate: an increased h implies a smoother

estimate with greater impact near the origin.

The bottom graphs of Fig. 3.28a and 3.28b are illustrative of why a theo-

retical maximum can change to an empirical minimum. The error PDF in (a)

is almost uniform for class ω − 1 , implying a high value of H S|− 1 ; however, the

error PDF for class ω 1 is highly concentrated implying a very low H S| 1 ; f E| 1

is clearly more concentrated than its left counterpart. Property 3 (Sect. 2.3.4)

and formula (C.3) give then the plausible justification of why the overall H S

turns out to be smaller for the off-optimal than for the optimal split point.

With the error PDF estimated with a suciently high value of h we get

the sort of curves showed with dotted line at the bottom graphs of (a) and

(b). Kernel smoothing “couples” the class-conditional components of the error

PDF, which is then seen as a “whole”, ignoring relation (C.3); the density for

the nonoptimal split has now a long tail, whereas the density of the optimal

split is clearly more concentrated at the origin. As a consequence a minimum

of the entropy is obtained at the optimal split point. A similar maximum-to-

minimum entropy flip due to kernel smoothing is observed in other classifiers,

namely those discussed in the present chapter.

We now analyze the theoretical behavior of the kernel smoothing effect

on two distinct PDFs that resemble the ones portrayed in Fig. 3.28. One of

them, f 1 ( x ), corresponds to the off-optimal error PDF with a large tail for

one class and a fast-decaying trend for the other class, modeled as

f 1 ( x ; λ )= 1

2 u ( x ; − 1 , 0) + 1

2 e + ( x ; λ ) ,

Search WWH ::

Custom Search

Home