Information Technology Reference
In-Depth Information
In order to compute its gradient, one can compute the gradient of the
partial cost function
J
k
(
w
) related to observation
k
, and subsequently sum
over all examples.
Backpropagation consists essentially in a repeated application of the rule of
chained derivatives. First, one notices that the partial cost function depends of
w
ij
only through the value of the output of neuron
i
, which itself is a function
of the potential of neuron
i
only; therefore, one has
∂J
k
∂w
ij
=
∂J
k
∂v
i
∂v
i
∂w
ij
=
δ
i
x
j
.
k
k
k
where
(
∂J
k
)
/
(
∂vi
)
k
is the value of the gradient of the partial cost function with
respect to the potential of neuron
i
when the inputs of the network are
the variables of example
k
.
•
•
(
∂v
i
)
/
(
∂w
ij
)
k
is the value of the partial derivative of the potential of neu-
ron
i
with respect to parameter
w
ij
when the inputs of the network are
the variables of example
k
.
•
x
j
is the value of input
j
of neuron
i
when the inputs of the network are
the variables of example
k.
The computation of the last two quantities is straightforward. The only
problem is the computation of
δ
i
on the right-hand side of the equation. These
quantities can be advantageously computed recursively from the outputs to
the inputs, as follows.
•
For output neuron
i
,
=
∂J
k
∂v
i
=
∂
∂v
i
(
y
p
−g
(
x
,
w
))
2
=
−
2
g
(
x
k
,
w
)
∂g
(
x
,
w
)
∂v
i
δ
i
.
k
k
k
The output
g
(
x
,
w
) of the model is the output
y
i
of the output neuron;
therefore the above relation can be written as
δ
i
2
g
(
x
k
,
w
)
f
(
v
i
)
where
f
(
v
i
) is the derivative of the activation function of the output
neuron when the network inputs are those of example
k
. Usually, for a
feedforward neural network designed for modeling, the activation function
of the output neuron is linear, so that the above relation reduces to
δ
i
=
−
=
2
g
(
x
k
,
w
).
−
•
For hidden neuron
i
, the cost function depends on the potential of neuron
i
only through the potentials of the neurons
m
that receive the value of
the output of neuron
i
, i.e., of all neurons that are adjacent to neuron
i
in
the graph of the connections of the network, and are located between that
neuron and the output:
∂J
k
∂v
i
∂J
k
∂v
m
∂v
m
∂v
i
∂v
m
∂v
i
=
m
=
m
δ
i
≡
δ
k
m
.
k
k
k
k
Search WWH ::
Custom Search