Information Technology Reference
In-Depth Information
see Sect. E.2) we are closely estimating f E (0) and R ZED exhibits what was
observed in Example 5.2: f E (0) w 0
has a lower value than f E (0) w 0
for any
f E (0) w 0
other off-optimal w 0 values. In particular,
.
On the other hand, using a large value for h , f E (0) attains its maximum at
w 0 =0, corresponding to the min P e solution. Note that we are not interested
in an accurate estimate of the error PDF but only to “force” it to have a large
value at the origin. By using a large h valuewearecomputinganoversmothed
KDE (fat estimation) that suces to mask off-optimal solutions and highlight
the optimal ones.
increases for w 0 →±∞
5.1.2.1
ZED Gradient Ascent
R ZED as follows:
Gradient ascent optimization can be applied to maximize
Algorithm 5.1 - ZED Gradient ascent algorithm
1. Choose a random initial parameter vector, w (with components w k ).
2. Compute the error PDF estimate at the origin using the classifier outputs
y i = ϕ ( x i ; w ) at the n available instances.
3. Compute the partial derivatives of
R ZED with respect to the parameters:
G
∂e i
∂w k
n
∂ f E (0)
∂w k
1
nh 2
e i
h
=
=
i =1
e i G
∂e i
∂w k
n
e i
h
1
nh 3
=
(5.17)
i =1
where ∂e i /∂w k depends on the classifier architecture.
4. Update at each iteration, m , the parameters w ( m )
k
using a η amount (learn-
ing rate) of the gradient:
w ( m 1)
k
∂ f E (0)
∂w k
w ( m )
k
= w ( m− 1)
k
+ η
.
(5.18)
5. Go to step 2, if some stopping criterion is not met.
It is interesting to note that, in constrast with the EE risks, this algorithm has
a O ( n ) complexity similar to the MMSE or MCE gradient descent algorithms.
In the following examples we apply Algorithm 5.1 to the training of a
perceptron solving a two-class problem.
 
Search WWH ::




Custom Search