EE-Inspired Risks - Minimum Error Entropy Classification

Information Technology Reference

In-Depth Information

see Sect. E.2) we are closely estimating f E (0) and R ZED exhibits what was

observed in Example 5.2: f E (0) w 0

has a lower value than f E (0) w 0

for any

f E (0) w 0

other off-optimal w 0 values. In particular,

On the other hand, using a large value for h , f E (0) attains its maximum at

w 0 =0, corresponding to the min P e solution. Note that we are not interested

in an accurate estimate of the error PDF but only to “force” it to have a large

value at the origin. By using a large h valuewearecomputinganoversmothed

KDE (fat estimation) that suces to mask off-optimal solutions and highlight

the optimal ones.

increases for w 0 →±∞

5.1.2.1

ZED Gradient Ascent

R ZED as follows:

Gradient ascent optimization can be applied to maximize

Algorithm 5.1 - ZED Gradient ascent algorithm

1. Choose a random initial parameter vector, w (with components w k ).

2. Compute the error PDF estimate at the origin using the classifier outputs

y i = ϕ ( x i ; w ) at the n available instances.

3. Compute the partial derivatives of

R ZED with respect to the parameters:

∂e i

∂w k

∂ f E (0)

∂w k

nh 2

e i

−

i =1

e i G

∂e i

∂w k

e i

nh 3

−

(5.17)

i =1

where ∂e i /∂w k depends on the classifier architecture.

4. Update at each iteration, m , the parameters w ( m )

using a η amount (learn-

ing rate) of the gradient:

w ( m − 1)

∂ f E (0)

∂w k

w ( m )

= w ( m− 1)

+ η

(5.18)

5. Go to step 2, if some stopping criterion is not met.

It is interesting to note that, in constrast with the EE risks, this algorithm has

a O ( n ) complexity similar to the MMSE or MCE gradient descent algorithms.

In the following examples we apply Algorithm 5.1 to the training of a

perceptron solving a two-class problem.

Minimum Error Entropy Classification

Search WWH ::

Custom Search

Home