Information Technology Reference
In-Depth Information
2
200
0
ψ CE
ψ MSE
ψ CE
1.5
1
150
−50
0.5
0
100
−100
−0.5
−1
50
−150
−1.5
y
e
y
−2
0
−200
−2
−1
0
1
2
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
(a)
(b)
(c)
0.03
0.5
0.8
ψ ZED
ψ ZED
ψ ZED
0.6
0.02
0.4
0.01
0.2
0
0
0
−0.2
−0.01
−0.4
−0.02
−0.6
e
e
e
−0.03
−0.5
−0.8
−2
−1
0
1
2
−2
−1
0
1
2
−2
−1
0
1
2
(d)
(e)
(f)
Fig. 5.5 Weight functions: a) ψ MSE ;b) ψ CE for t =1 ;c) ψ CE for t =0 ;d) ψ ZED
for h =0 . 1 ;e) ψ ZED for h =1 ;f) ψ ZED for h =10
We may then write formulas (5.20) to (5.22) as
∂w = k ψ ( e i ) ∂y i
∂R
.
∂w
Here we omit the constants 1 /n and 1 /nh 3 from ψ MSE and ψ ZED ,re-
spectively. This is unimportant from the point of view of optimization as one
could always multiply R MSE and R ZED by n and nh 3 respectively without
affecting their extrema. As discussed in [214] this only affects the behav-
ior of the learning process by increasing the number of necessary epochs to
converge.
Figure 5.5 presents a comparison of the behavior of the weight functions.
From Fig. 5.5a we see that ψ MSE is linear such that each error contributes
with a weight equal to its own value. Thus, larger errors are more penal-
ized contributing with a larger weight for the whole gradient. On the other
hand, ψ CE confers even larger weights to larger errors. As Figs. 5.5b and 5.5c
show this weight assignment follows a hyperbolic-type rule (in contrast with
the linear rule of ψ MSE ). Now, for ψ ZED one may distinguish three basic
behaviors:
1. If h is small, as in Fig. 5.5d, ψ ZED ( e )
0 for a large sub-domain of the
variable e . This may cause diculties for the learning process to converge
(or even start at all). In fact, it is common procedure to randomly initialize
the classifier's parameters with values close to zero, producing errors around
e =
1 and e =1. In this case, the learning process would not converge.
Search WWH ::




Custom Search