Information Technology Reference
In-Depth Information
( ∂y m /∂w mj ) k =( ∂y m /∂v m ) k ( ∂v m /∂w mj ) k = f ( v j ) x j ,
where x j
is the value of input j of the network for example k ,
For a neuron m , which receives quantity x j from input j of the network,
or from neuron j , through other neurons of the network, located between
input or neuron j and neuron m ,
∂y m
∂w ij
= ∂y m
∂v m
∂v m
∂w ij
∂v m
∂y l
∂y l
∂w ij
= f ( v k m )
l
k
k
k
k
k
∂y l
∂w ij
= f ( v k m )
I
w ml
,
k
where subscript l denotes all neurons that are adjacent to neuron m in the
graph of connections, between neuron j (or input j ) and neuron m .
By using those relations recursively, the derivatives of the output of each
neuron with respect to the parameters can be computed, from the inputs to
the outputs of the network.
Once those derivatives are computed, the gradient of the partial cost func-
tion can be derived as
∂J k
∂w ij
=
∂w ij
g ( x , w )) 2
g ( x k , w )) ∂g ( x , w )
∂w ij
( y p
=2( y p
.
k
k
k
Furthermore, g ( x , w ) is the output of a neuron of the network; therefore,
the last derivative can be computed recursively by the same procedure. The
gradient of the partial cost being computed for each example, the gradient of
the total cost function is obtained by summation over all examples.
Comparison Between Forward Computation of the Gradient of the Cost
Function and Backpropagation
The above discussion shows that backpropagation requires the evaluation of
one gradient per neuron, whereas the forward computation requires the com-
putation of one gradient per connection. Since the number of connections is
roughly the square of the number of neurons, the number of gradient evalua-
tions is larger for forward computation of the gradient than for backpropaga-
tion.
Therefore, backpropagation will be used for the evaluation of the gradient
of the cost function in the training of feedforward neural networks. For recur-
rent neural networks, however, forward computation is sometimes mandatory,
as shown in the section devoted to the training of recurrent neural networks.
Evaluation of the Gradient of the Cost Function under Constraint: The
Shared Weight Technique
When training recurrent neural networks-as discussed in the section devoted
to black-box dynamic modeling and in Chap. 4- and when training some
Search WWH ::




Custom Search