Information Technology Reference
In-Depth Information
Let us start by considering the k th output perceptron with its weights
being adjusted by gradient descent. When using the empirical Shannon's
entropy of the error, H S , we then apply expression (3.8), which we rewrite
below for the k th output perceptron in vector notation:
e jk ) ∂e ik
.
∂ H S
w k
n
n
1
n 2 h 2
G h (e i
e j )
∂e jk
w k
=
( e ik
w k
(6.1)
f (e i )
i =1
j =1
Whereas expression (3.8) contemplated the adjustment of a single weight,
we now formulate the adjustment with respect to a whole vector of weights
(including biases): the weight vector w k of an arbitrary k th output per-
ceptron. The derivative of H S with respect to the weights depends on n
c -dimensional error vectors denoted e i and e j . Each component ∂ H S /∂w lk
of vector ∂ H S /∂ w k in (6.1) can be conveniently expressed (namely, for im-
plementation purposes) as the sum of all elements of the matrix resulting
from:
1
1
f ( e 1 )
f ( e 1 ) ···
G h (e 1
e 1 )
···
G h (e 1
e n )
1
n 2 h 2
.
.
.
.
.
×
1
1
e n ) ···
G h (e n
e 1 )
···
G h (e n
e n )
f (
f (
e n )
∂e 1 k
∂e 1 k
∂e 1 k
∂e nk
∂w lk
∂w lk
∂w lk ···
∂w lk
e 1 k
e 1 k ···
e 1 k
e nk
.
.
.
.
.
×
.
×
(6.2)
∂e nk
∂e 1 k
∂e nk
∂e nk
∂w lk
e nk
e 1 k ···
e nk
e nk
∂w lk
∂w lk ···
∂w lk
where '. × ' denotes element-wise product [212]. The first matrix is not present
when Rényi's quadratic entropy or information potential is used (see also
expression (3.9)).
Once all n error vectors (for the n input vectors x i ), relative to the m th
training epoch have been obtained, one is then able to compute the updated
weights for the output perceptron:
w ( m 1)
∂ H S
w k
w ( m )
k
= w ( m− 1)
k
Δ w ( m− 1)
k
Δ w ( m− 1)
k
,
= η
, (6.3)
with
where η is the learning rate.
The updating of the weight vector w l , relative to an arbitrary l th percep-
tron of the hidden-layer, is done as usual with the back-propagation algo-
rithm. One needs all back-propagated errors from the output layer (incident
dotted arrows in Fig. 6.1). Denoting by ϕ ( . ) the activation function assumed
the same for all perceptrons, the updating vector for w l at the m th training
epoch is then:
 
Search WWH ::




Custom Search