Information Technology Reference
In-Depth Information
Because the cost function (6.1) sums the error contributions of individual exam-
ples, its partial derivative with respect to a weight w jk from unit j to unit k is a sum
of components
∂E ( i )
∂w jk that can be computed for each example i separately. The key
idea of backpropagation is that
∂E ( i )
∂w jk
can be expressed in terms of a backpropagated
error δ ( i )
k
and the source activity o ( i )
j
present at the weight:
∂E ( i )
∂w jk
= o ( i )
j
· δ ( i k .
(6.4)
The backpropagated error δ ( i )
k of a hidden unit is a weighted sum of the errors
δ ( i l of all units l receiving input from unit k , multiplied with the derivative of the
transfer function f k , producing the output o ( i )
k
= f k ( ξ ( i )
k
) of unit k :
X
= df k
( i )
k
δ ( i )
k
w kl δ ( i )
l
(6.5)
[hidden unit] ,
l
P
with ξ ( i )
k
j w jk o ( i j describing the weighted sum of the inputs to k .
If unit k is an output unit, its error component can be computed directly:
=
= df k
( i )
k
δ ( i )
k
( o ( i )
k
y ( i )
k
)
(6.6)
[output unit] ,
where y ( i k is the component of the target vector y i that corresponds to unit k .
The backpropagation technique can be applied to the Neural Abstraction Pyra-
mid architecture. Since the basic processing element, described in Section 4.2.1, is a
two-layered feed-forward neural network, directed acyclic graphs of such process-
ing elements form a large feed-forward neural network with shared weights.
A simple modification is needed for the update of shared weights: the sum of all
weight-updates, which have been computed for the individual instances of a weight,
is added to it. By replacing the weight-instances with multiplicative units that re-
ceive an additional input from a single unit which outputs the value of the shared
weight, one can show that this indeed modifies the weight in the direction of the
negative gradient [193].
When implementing error backpropagation in the Neural Abstraction Pyramid,
one must also take care to handle the border effects correctly. The simplest case
is when the border cells of a feature array are set to a constant value. Since the
derivative of a constant is zero, the error component arriving at these border cells
does not need to be propagated any further. In contrast, if the activity of a border
cell is copied from a feature cell, the error component arriving at it must be added
to the error component of that feature cell.
Because the weights of a projection unit are stored as an adjacency list in the
template of the unit, it is easiest to implement the sum in Equation 6.5 by accumulat-
ing contributions from the units receiving inputs from it. As the network is traversed
Search WWH ::




Custom Search