Information Technology Reference
In-Depth Information
X
X
Q
W
2
t
q
f
w
i
E
ð
w
Þ
¼
b
p
q
;
w
þ a
ð
10
Þ
q
¼1
i
¼1
where:
is an inverse noise variance parameter, t
q
is
the target data, the estimate f realized by the NN, p is the vector of input sets, w is
the vector of weights (and biases), Q is the number of training examples and W is
the total number of weights.
Equation (
10
) can be further rewritten as:
ʱ
is a weight decay parameter,
ʲ
E
ð
w
Þ
¼
b
E
D
þ a
E
w
ð
11
Þ
where:
X
Q
2
t
q
f
E
D
¼
p
q
;
w
ð
12
Þ
q¼1
X
W
w
i
E
w
¼
ð
13
Þ
i¼1
Using the modi
ed cost function, the gradient g and Hessian H, respectively,
are:
b
J
T
r
þ
g ¼
2
2
a
w
ð
14
Þ
b
J
T
J
þ
H ¼
2
2
a
I
ð
15
Þ
Thus, the increment of weights
ʔ
w becomes:
1
T
D
w ¼
b
J
J
þ k þ a
ð
Þ
I
g
ð
16
Þ
Furthermore, for the purpose of updating
ʱ
and
ʲ
, the Hessian formulation was
utilized through the following equations:
H
1
c
¼
I
2
a
trace
ð
17
Þ
c
2E
w
a
¼
ð
18
Þ
Q
c
2E
D
b
¼
ð
19
Þ
where:
is the effective number of parameters, that is a measure of how many
parameters or weights are effectively used in the NN learning with respect to the
c
Search WWH ::
Custom Search