Continuous Risk Functionals - Minimum Error Entropy Classification - page 16

Information Technology Reference

In-Depth Information

1

1

1

x 2

x 2

x 2

0.8

0.8

0.8

0.6

0.6

0.6

0.4

0.4

0.4

0.2

0.2

0.2

x 1

x 1

x 1

0

0

0

−3

−2

−1

0

1

2

3

−3

−2

−1

0

1

2

3

−3

−2

−1

0

1

2

3

(a)

(b)

(c)

Fig. 2.1 Separating the circles from the crosses by linear regression: a) 600 in-

stances per class; b) 120 instances per class; c) 600 instances per class with noisy

x 2 of class 1. The linear discriminant solution is the solid line.

defining a planar decision surface, has been obtained, we apply the threshold-

ing function, which in geometrical terms determines a linear decision border

(linear discriminant) as shown in Fig. 2.1.

There are, for this classification problem, an infinity of f w ∗ ( x ) solu-

tions, corresponding in terms of decision borders to any straight line in-

side ]

[0 , 1]. In Fig. 2.1a the data consists of 600 instances per

class and the MMSE regression solution results indeed in one of the P e =0

straight lines. This is the large size case; for large n (say, n> 400 instances

per class) one obtains solutions with no misclassified instances, practically

always. Figure 2.1b illustrates the small size case; the solutions may vary

widely depending on the particular data sample, from close to f w ∗ (i.e., with

practically no misclassified instances) to largely deviated as in Fig. 2.1b, ex-

hibiting a substantial number of misclassified instances. Finally, in Fig. 2.1c,

the same dataset as in Fig. 2.1a was used, but with 0.05 added to component

x 2 of class 1 ('crosses'); this small "noise" value was enough to provoke a

substantial departure from a f w ∗ solution, in spite of the fact that the data

is still linearly separable. The error rate in Fig. 2.1c instead of zero is now

above 3%.

−

0 . 05 , 0 . 05[

×

Example 2.2. Let us assume a univariate two-class problem (input X ), with

Gaussian class conditionals f X| 0 (left class) and f X| 1 (right class), with means

0 and 1 and standard deviation 0 . 5. The classifier task is to determine the

best separating x point. Such a classifier is called a data splitter. With equal

priors the posterior probabilities, P T |x , of the classifier (see formula (1.6)) are

as shown with solid line in Fig. 2.2. Note that by symmetry P 0 |x =1

−

P 1 |x

and the min P e split point (the decision border) is 0 . 5.

Now suppose that due to some implementation “noise” one computed pos-

teriors P T |x with a deviation δ such that P 1 |x = P 1 |x −

[ P − 1

δ for x

∈

1 |x ( δ ) , 0 . 5]

P − 1

and P 1 |x = P 1 |x + δ for x

1 |x ( δ )].Below δ, P T |x =0,andabove

∈

]0 . 5 , 1

−

P − 1

P 1 |x and δ =0 . 01 ( P − 1

1 |x ( δ ) ,P T |x =1.With P 0 |x =1

1

0 . 64)

we obtain the dotted line curves shown in Fig. 2.2. The new P T |x are perfect

legitimate posterior probabilities and differ from P T |x no more than 0.01.

−

−

1 |x ( δ )=

−

Next Page

Minimum Error Entropy Classification

Search WWH ::

Custom Search

Home