Information Technology Reference
In-Depth Information
see Sect. E.2) we are closely estimating
f
E
(0) and
R
ZED
exhibits what was
observed in Example 5.2:
f
E
(0)
w
0
has a lower value than
f
E
(0)
w
0
for any
f
E
(0)
w
0
other off-optimal
w
0
values. In particular,
.
On the other hand, using a large value for
h
,
f
E
(0) attains its maximum at
w
0
=0, corresponding to the min
P
e
solution. Note that we are not interested
in an accurate estimate of the error PDF but only to “force” it to have a large
value at the origin. By using a large
h
valuewearecomputinganoversmothed
KDE (fat estimation) that suces to mask off-optimal solutions and highlight
the optimal ones.
increases for
w
0
→±∞
5.1.2.1
ZED Gradient Ascent
R
ZED
as follows:
Gradient ascent optimization can be applied to maximize
Algorithm 5.1 - ZED Gradient ascent algorithm
1. Choose a random initial parameter vector,
w
(with components
w
k
).
2. Compute the error PDF estimate at the origin using the classifier outputs
y
i
=
ϕ
(
x
i
;
w
) at the
n
available instances.
3. Compute the partial derivatives of
R
ZED
with respect to the parameters:
G
∂e
i
∂w
k
n
∂ f
E
(0)
∂w
k
1
nh
2
e
i
h
=
−
−
=
i
=1
e
i
G
∂e
i
∂w
k
n
e
i
h
1
nh
3
=
−
−
(5.17)
i
=1
where
∂e
i
/∂w
k
depends on the classifier architecture.
4. Update at each iteration,
m
, the parameters
w
(
m
)
k
using a
η
amount (learn-
ing rate) of the gradient:
w
(
m
−
1)
k
∂ f
E
(0)
∂w
k
w
(
m
)
k
=
w
(
m−
1)
k
+
η
.
(5.18)
5. Go to step 2, if some stopping criterion is not met.
It is interesting to note that, in constrast with the EE risks, this algorithm has
a
O
(
n
) complexity similar to the MMSE or MCE gradient descent algorithms.
In the following examples we apply Algorithm 5.1 to the training of a
perceptron solving a two-class problem.