Information Technology Reference
In-Depth Information
ªw
w
w
w
w
w
º
e
e
e
1
1
1
"
«
»
w
w
w
w
w
w
«
»
1
2
N
p
«
»
w
w
w
w
w
w
e
e
e
«
2
2
2
»
"
«
»
J
()
w
w
w
w
(6.23c)
w
w
w
1
2
N
p
«
»
#
#
#
«
»
«
»
w
w
w
w
w
w
e
e
e
«
»
N
N
N
"
«
»
w
w
w
w
w
w
«
»
¬
¼
1
2
N
p
ª
º
and
w, w,
¬
"
,w
is the parameter vector of network. From (6.23c) it is
w
¼
1
2
N
p
seen that the dimension of the Jacobian matrix is ( N u N p ), N and N p being the
number of training samples and the number of adjustable network parameters
respectively. For the Gauss-Newton method the second term in (6.23b) is assumed
to be zero, so that the update according to (6.21a) becomes
>
@
1
T
T
'
w
w
e
w
.
(6.24a).
w
J
w
J
J
The Levenberg-Marquardt modification of the Gauss-Newton method is
>
@
1
T
'
w
T
w
e
w
(6.24b).
w
J
w
P
I
J
J
in which I is the ( N p u N p ) identity matrix and the parameter P is multiplied by
some constant factor P inc whenever an iteration step increases the value of V ( w ),
and divided by P dec whenever a step reduces the value of V ( w ). Hence, the update
according to (6.21b) is
1
ª
T
º
w
k
1
w
k
w
J
w
P
I
T
w
e
w
.
(6.24c)
J
J
¬
¼
Note that for large P the algorithm becomes the steepest descent gradient
algorithm with step size P
1
, whereas for small
P i.e .
P| it becomes the
0,
Gauss-Newton algorithm. Usually,
However, in our program we have
selected two different values for them. In order to get even faster convergence, a
small momentum term mo = 0.098 was also added, so that the final update
becomes
.
PP
inc
dec
1
T
ª
º
T
w
k
1
w
k
w
J
w
P
I
w
e
w
J
J
¬
¼
(6.24d)
mo
ww
k
k
1
Search WWH ::




Custom Search