Information Technology Reference
In-Depth Information
Because the cost function (6.1) sums the error contributions of individual exam-
ples, its partial derivative with respect to a weight
w
jk
from unit
j
to unit
k
is a sum
of components
∂E
(
i
)
∂w
jk
that can be computed for each example
i
separately. The key
idea of backpropagation is that
∂E
(
i
)
∂w
jk
can be expressed in terms of a backpropagated
error
δ
(
i
)
k
and the source activity
o
(
i
)
j
present at the weight:
∂E
(
i
)
∂w
jk
=
o
(
i
)
j
·
δ
(
i
k
.
(6.4)
The backpropagated error
δ
(
i
)
k
of a hidden unit is a weighted sum of the errors
δ
(
i
l
of all units
l
receiving input from unit
k
, multiplied with the derivative of the
transfer function
f
k
, producing the output
o
(
i
)
k
=
f
k
(
ξ
(
i
)
k
)
of unit
k
:
X
=
df
k
dξ
(
i
)
k
δ
(
i
)
k
w
kl
δ
(
i
)
l
(6.5)
[hidden unit]
,
l
P
with
ξ
(
i
)
k
j
w
jk
o
(
i
j
describing the weighted sum of the inputs to
k
.
If unit
k
is an output unit, its error component can be computed directly:
=
=
df
k
dξ
(
i
)
k
δ
(
i
)
k
(
o
(
i
)
k
−
y
(
i
)
k
)
(6.6)
[output unit]
,
where
y
(
i
k
is the component of the target vector
y
i
that corresponds to unit
k
.
The backpropagation technique can be applied to the Neural Abstraction Pyra-
mid architecture. Since the basic processing element, described in Section 4.2.1, is a
two-layered feed-forward neural network, directed acyclic graphs of such process-
ing elements form a large feed-forward neural network with shared weights.
A simple modification is needed for the update of shared weights: the sum of all
weight-updates, which have been computed for the individual instances of a weight,
is added to it. By replacing the weight-instances with multiplicative units that re-
ceive an additional input from a single unit which outputs the value of the shared
weight, one can show that this indeed modifies the weight in the direction of the
negative gradient [193].
When implementing error backpropagation in the Neural Abstraction Pyramid,
one must also take care to handle the border effects correctly. The simplest case
is when the border cells of a feature array are set to a constant value. Since the
derivative of a constant is zero, the error component arriving at these border cells
does not need to be propagated any further. In contrast, if the activity of a border
cell is copied from a feature cell, the error component arriving at it must be added
to the error component of that feature cell.
Because the weights of a projection unit are stored as an adjacency list in the
template of the unit, it is easiest to implement the sum in Equation 6.5 by accumulat-
ing contributions from the units receiving inputs from it. As the network is traversed
Search WWH ::
Custom Search