EE-Inspired Risks - Minimum Error Entropy Classification

Information Technology Reference

In-Depth Information

2. For moderate h (say h

2 as in Fig. 5.5e), ψ ZED shows a sigmoidal-type

shape and, as in ψ MSE and ψ CE , larger errors contribute to larger weights.

Note, however, the contrast with ψ CE : for larger errors ψ CE “accelerates”

the weight value while ψ ZED “decelerates”.

3. For larger values of h , ψ ZED behaves like ψ MSE , as illustrated in Fig. 5.5f.

In fact, lim h→ + ∞ ψ ZED = ψ MSE .

Despite the disadvantage of R ZED over R MSE and R CE in having to set h ,

it is important to emphasize that we are not concerned in obtaining a good

estimate of f E (0) but only to force it to be as high as possible. This means

that we can set some moderately high value for h with the advantage of

adapting it, and thus controlling how ψ ZED behaves, for each classification

problem at hand.

Moreover, the second basic behavior above suggests that the “decelerated”

caractheristic of ψ ZED enables a reduced sensitivity of R ZED to outliers (the

sensitivity degree controlled by h ) when compared to the other alternative

risks. This is illustrated in the following example.

≈

Example 5.5. Consider discriminating two classes with bivariate input data

x = x 1 x 2 ] T , with circular uniform distribution (see Example 3.8 in

Sect. 3.3.1) and the following parameters:

μ − 1 =[0 0] T ,

μ 1 =[1 . 10 T ,r − 1 = r 1 =1 .

(5.26)

By symmetry the theoretically optimal linear discriminant is orthogonal to

x 1 at the decision threshold d =

w 0 /w 1 =0 . 55 and with min P e =0 . 1684.

Suppose that a training set from the said distributions with n instances

per class was available, which for whatever reason was “contaminated” by

the addition to class ω − 1 of n 0 instances, n 0

−

n , with uniform distribution

in ]1 , 1+ l ] along x 1 . Figure 5.6 shows an example of such dataset with

n = 200 instances per class and n 0 =10outliers uniformly distributed in

]1 , 1+ l ] with l =0 . 2 (solid circles extending beyond x 1 =1). Also shown is a

linear discriminant adjusted by an R ZED perceptron trained with h =1(fat

estimation of the error PDF) during 80 epochs with η =0 . 001.

In order to investigate the influence of the n 0 outliers in the determination

of the decision threshold d , we proceed as follows: we repeat n exp times the

experiment of randomly generating datasets with 2 n + n 0 instances ( n + n 0

instances for class ω − 1 ,and n instances for class ω 1 )andtrain R ZED and

R MSE perceptrons always with the above settings (80 epochs, η =0 . 001,

h =1). We do this for several values of l , governing the spread of the outliers.

Figure 5.7 shows averages of d

std ( d ) in terms of l , obtained in n exp = 500

experiments, for datasets with n = 200 instances per class and two values of

n 0 : n 0 =10(Fig. 5.7a) and n 0 =20(Fig. 5.7b). The value l =1corresponds

to the no outlier case. The experimental results shown in Fig. 5.7 clearly

indicate that the average d for the R ZED perceptron (thick dashed line) is

Minimum Error Entropy Classification

Search WWH ::

Custom Search

Home