Information Technology Reference
In-Depth Information
for
n
odd:
fr
n
(
x
)=
√
xP
(
n−
3)
/
2
(
x
), with, for
n
•
≥
6:
P
(
n
−
3)
/
2(
x
)=
1+
(
n−
3)
/
2
k
=1
[1
/
(2
k
)((2
k
x
)
k
]; for
n
=3:
P
0
(
x
) = 1; for
n
−
1)!!)
/
(
k
!)(1
−
=1:
P
−
1
(
x
)=0.
2.10.5 Optimization Methods: Levenberg-Marquardt and BFGS
That presentation is from [Oussar 1998]. The algorithms are also described in
[Press et al. 1992].
2.10.5.1 The BFGS Algorithm
The BFGS algorithm consists in updating the parameters, at iteration
i
,by:
w
(
i
)=
w
(
i
1)) where
µ
i
is positive, and where
M
i
is
an approximation, computed iteratively, of the inverse of the Hessian matrix;
the latter is computed, at each iteration, by
M
i
=
M
i−
1
+
1+
γ
T
−
1)
−
µ
i
M
i
∇
J
(
w
(
i
−
δ
T
δ
i−
1
T
i−
1
M
i−
1
+
M
i−
1
γ
i−
1
T
i−
1
M
i−
1
γ
i−
1
δ
T
i−
i−
1
δ
i−
1
δ
T
i−
i−
1
1
γ
i−
1
−
,
δ
T
i−
1
γ
i−
1
1
γ
i−
1
where
γ
i−
1
=
1). The
initial value
M
0
is the identity matrix. If, at some iteration, the matrix is not
found to be definite positive, it is re-initialized to the identity matrix.
That approximation is valid only in the neighborhood of a minimum of the
cost function. Therefore, it is recommended to use simple gradient descent (or
stochastic gradient descent) at the beginning of training, in order to get close
to a minimum, then switch to BFGS when the minimum is close enough.
∇
J
(
w
(
i
))
−∇
J
(
w
(
i
−
1)) and
δ
i−
1
=
w
(
i
)
−
w
(
i
−
2.10.5.2 The Levenberg-Marquardt Algorithm
The Levenberg-Marquardt algorithm consists in updating the parameters, at
iteration
i
,by
1)) +
µ
i
I
]
−
1
w
(
i
)=
w
(
i
−
1)
−
[
H
(
w
(
i
−
∇
(
w
(
i
−
1))
,
where
µ
i
is positive. For small values of the step
µ
i
, the Levenberg-Marquardt
algorithm is close to the Newton method, whereas, for large values of
µ
i
,the
Levenberg-Marquardt algorithm is equivalent to simple gradient descent with
step 1/
µ
i
.
The application of the algorithm requires the inversion of matrix [
H
(
w
(
i
−
1))+
µ
i
I
]. The exact expression of the Hessian matrix of the total cost function
J
(
w
)is
∂e
k
∂
w
(
i
)
∂e
k
∂
w
(
i
)
T
+
N
N
∂
2
e
k
∂
w
(
i
)(
∂
w
(
i
))
T
e
k
,
H
(
w
(
i
)) =
k
=1
k
=1
with
e
k
=
y
p
−
y
k
.
Search WWH ::
Custom Search