Information Technology Reference
In-Depth Information
cannot be computed by such networks. Later, it was shown that feed-forward net-
works with enough nonlinear hidden units can approximate any continuous function
over a compact domain [104].
6.2.1 Error Backpropagation
Key to the success of feed-forward neural networks was gradient-based learning.
Gradient-based learning algorithms minimize a cost function E by gradient descent.
Frequently, the quadratic approximation error is used as cost function:
N
X
1
2
k o i y i k 2 ,
E =
(6.1)
i =1
where o i is the output of the network when the i th example x i is presented to it, and
y i is the desired output for that example.
In a gradient descent method, a parameter w is modified according to the partial
derivative of the error function E with respect to it:
∆w = η ∂E
(6.2)
∂w ,
where η > 0 is the learning rate. If η is chosen small enough, the repeated applica-
tion of (6.2) lowers E , until a local minimum is reached.
The simplest example of gradient descent learning is the delta rule. It is applica-
ble for linear networks without hidden units: o ( i ) = wx i . For the weights w j of the
network's output unit, the learning rule (6.2) can be rewritten as:
X
N
( o ( i ) y ( i ) ) x ( i j ,
∆w j = η
(6.3)
i =1
where x ( i j is the j th component of the input vector x i .
In order to make the gradient descent idea work, differentiable transfer functions
are needed. In the example above, the transfer function was linear and hence could
be omitted from the analysis. In multi-layered networks, it is necessary to have
nonlinear transfer functions for the hidden units, to make them more powerful than
networks without hidden units. One frequently used non-linear transfer function is
the sigmoid o = f sig ( ξ ) = 1
1+ e ξ that has already been discussed in Section 4.2.4.
Its derivative can be expressed in terms of its output: f 0
sig ( ξ ) = o (1 o ) .
For the gradient computation in multi-layered networks, the idea of backprop-
agation of an error signal was introduced e.g. by Rumelhart et al. [200]. The error
backpropagation technique is an efficient method to compute the partial derivative
of a cost function with respect to a weight, in the same way as the ordered update
of the network activity in the feed-forward mode is an efficient method to compute
the network output. As the name suggests, backpropagation visits the nodes of the
network in the opposite order of the feed-forward step.
Search WWH ::




Custom Search