Information Technology Reference
In-Depth Information
In order to compute its gradient, one can compute the gradient of the
partial cost function J k ( w ) related to observation k , and subsequently sum
over all examples.
Backpropagation consists essentially in a repeated application of the rule of
chained derivatives. First, one notices that the partial cost function depends of
w ij only through the value of the output of neuron i , which itself is a function
of the potential of neuron i only; therefore, one has
∂J k
∂w ij
= ∂J k
∂v i
∂v i
∂w ij
= δ i x j .
k
k
k
where
( ∂J k ) / ( ∂vi ) k is the value of the gradient of the partial cost function with
respect to the potential of neuron i when the inputs of the network are
the variables of example k .
( ∂v i ) / ( ∂w ij ) k is the value of the partial derivative of the potential of neu-
ron i with respect to parameter w ij
when the inputs of the network are
the variables of example k .
x j is the value of input j of neuron i when the inputs of the network are
the variables of example k.
The computation of the last two quantities is straightforward. The only
problem is the computation of δ i on the right-hand side of the equation. These
quantities can be advantageously computed recursively from the outputs to
the inputs, as follows.
For output neuron i ,
= ∂J k
∂v i
=
∂v i
( y p −g ( x , w )) 2
= 2 g ( x k , w ) ∂g ( x , w )
∂v i
δ i
.
k
k
k
The output g ( x , w ) of the model is the output y i
of the output neuron;
therefore the above relation can be written as δ i
2 g ( x k , w ) f ( v i )
where f ( v i ) is the derivative of the activation function of the output
neuron when the network inputs are those of example k . Usually, for a
feedforward neural network designed for modeling, the activation function
of the output neuron is linear, so that the above relation reduces to δ i
=
=
2 g ( x k , w ).
For hidden neuron i , the cost function depends on the potential of neuron
i only through the potentials of the neurons m that receive the value of
the output of neuron i , i.e., of all neurons that are adjacent to neuron i in
the graph of the connections of the network, and are located between that
neuron and the output:
∂J k
∂v i
∂J k
∂v m
∂v m
∂v i
∂v m
∂v i
=
m
=
m
δ i
δ k m
.
k
k
k
k
Search WWH ::




Custom Search