MEE with Discrete Errors - Minimum Error Entropy Classification

Information Technology Reference

In-Depth Information

∂H S

∂w 0

We remark that the proof has only analyzed

because the other deriva-

tives (and consequently the complete gradient

H S ) are rather intricate.

Thus, equality of class error probabilities is just a necessary condition. The

following example illustrates this result.

Example 4.5. Consider the perceptron implementing the family of lines w 1 x 1 +

w 2 x 2 + w 0 =0to discriminate between two bivariate Gaussian classes. First,

let

∇

20]and Σ 1 = Σ − 1 = I .Theoptimalsolutionisgiven(asa

function of p ) by the vertical line with equation

μ ± 1 =[

4 ln 1

−

x 1 =

40 ln 1 − p .

Additionally, w 1 must be positive to give the correct class orientation. One

can then numerically determine that

w 1

The optimal set of parameters must satisfy w 2 =0and w 0 =

−

H S ( w ∗ )= 0 only if p =1 / 2,which

corresponds to the class setting with equal class error probabilities.

∇

If we now assume p =1 / 2 and Σ 1 =[ 2 01 ], the optimal solution is

6+ 32 + 2 ln(2) .

x 1 =

−

The error probabilities are unequal, P − 1 ≈

0 . 019 and P 1 ≈

0 . 029,and

6+ 32 + 2 ln(2)))

H S ( w 1 , 0 ,w 1 (

∇

−

= 0 .

(4.60)

∂H S

∂w 1

∂H S

∂w 2

∂H S

∂w 0

More precisely,

> 0 at the possible optimal

solutions. Therefore, the optimal solution is not a critical point of the error

entropy.

< 0,

=0and

The above example indicates that it suces from now on to analyze the

case of bivariate Gaussian class distributions to get a picture of the discrete

MEE (SEE) behavior regarding the optimality issue. Recall from Sect. 3.3.1

that Gaussianity is preserved under linear transformations. Therefore, if the

classes have means

μ t and covariances Σ t for t

∈{−

1 , 1

}

, it is straightforward

to obtain

F U|t (0) = Φ

w T

μ t + w 0

−

w T Σ t w

(4.61)

For equal priors one gets

;

w T

μ − 1 + w 0

w T

P 1 = 1

μ 1 + w 0

w T Σ − 1 w

w T Σ 1 w

P − 1 =

−

(4.62)

Unfortunately these expressions imply a rather intricate entropy formula and

of the corresponding derivatives. Let us consider spherical distributions with

Σ − 1 = Σ 1 = I , to obtain a linear (optimal) solution and, in order to simplify

Minimum Error Entropy Classification

Search WWH ::

Custom Search

Home