Continuous Risk Functionals - Minimum Error Entropy Classification - page 24

Information Technology Reference

In-Depth Information

6

L(e)

4

2

0

e

−2

0

0.5

1

1.5

2

Fig. 2.4

L MSE (solid line) and L CE ( for t =1) (dashed line) as functions of e .

Thus, for cross-entropy we have a logarithmic loss function, L CE ( t, e )=

−

te ). Figure 2.4 shows (for t =1)the distance functions L MSE ( e )= e 2

and L CE ( t, e )=

ln(2

−

te ). A classifier minimizing the MSE risk functional

is minimizing the second-order moment of the errors, favoring input-output

mappings with low error spread (low variance) and deviation from zero. A

classifier minimizing the CE risk functional is minimizing an average loga-

rithmic distance of the error from its worst value (respectively, 2 for t =1

and

−

ln(2

−

1), as shown in Fig. 2.4. As a consequence of the logarithmic

behavior L CE ( t, e ) tends to focus mainly on large errors. Note that one may

have in some cases to restrict Y to the open interval ]

−

2 for t =

−

−

1 , 1[ in order to satisfy

the integrability condition of (2.31).

Example 2.3. Let us consider a two-class problem with target set T

=

{

,P (0) = P (1) = 1 / 2 and a classifier codomain restricted to [0 , 1] ac-

cording to the following family of uniform PDFs:

f Y ( y )= u ( y ;0 ,d )

0 , 1

}

if T =0

.

(2.32)

u ( y ;1

−

d, 1) if T =1

1

2 u ( e ;0 ,d )+ 2 u ( e ;

Note that according to (2.22) f E ( e ) is distributed as

−

d,

0) = u ( e ;

−

d, d ). Therefore, the MSE risk is simply the variance of this

distribution:

R MSE ( d )= d 2

3 .

(2.33)

We now compute the cross-entropy risk, which is in this case easier to derive

from f Y |t . First note that (2.28) is n times the empirical estimate of

R CE ( d )=

−

P (0)

E

[ln(1

−

Y )

|

T =0]

−

P (1)

E

[ln( Y )

|

T =1] .

(2.34)

Next Page

Minimum Error Entropy Classification

Search WWH ::

Custom Search

Home