Information Technology Reference
In-Depth Information
X
X
Q
W
2
t q f
w i
E
ð w Þ ¼ b
p q ; w
þ a
ð 10 Þ
q
¼1
i
¼1
where:
is an inverse noise variance parameter, t q is
the target data, the estimate f realized by the NN, p is the vector of input sets, w is
the vector of weights (and biases), Q is the number of training examples and W is
the total number of weights.
Equation ( 10 ) can be further rewritten as:
ʱ
is a weight decay parameter,
ʲ
E
ð w Þ ¼ b
E D þ a
E w
ð 11 Þ
where:
X
Q
2
t q f
E D ¼
p q ; w
ð 12 Þ
q¼1
X
W
w i
E w ¼
ð 13 Þ
i¼1
Using the modi
ed cost function, the gradient g and Hessian H, respectively,
are:
b J T r þ
g ¼
2
2
a w
ð 14 Þ
b J T J þ
H ¼
2
2
a I
ð 15 Þ
Thus, the increment of weights
ʔ
w becomes:
1
T
D w ¼ b J
J þ k þ a
ð
Þ I
g
ð 16 Þ
Furthermore, for the purpose of updating
ʱ
and
ʲ
, the Hessian formulation was
utilized through the following equations:
H 1
c ¼
I
2
a
trace
ð 17 Þ
c
2E w
a ¼
ð 18 Þ
Q
c
2E D
b ¼
ð 19 Þ
where:
is the effective number of parameters, that is a measure of how many
parameters or weights are effectively used in the NN learning with respect to the
c
Search WWH ::




Custom Search