Supervised Learning - Hierarchical Neural Networks for Image Interpretation

Information Technology Reference

In-Depth Information

cannot be computed by such networks. Later, it was shown that feed-forward net-

works with enough nonlinear hidden units can approximate any continuous function

over a compact domain [104].

6.2.1 Error Backpropagation

Key to the success of feed-forward neural networks was gradient-based learning.

Gradient-based learning algorithms minimize a cost function E by gradient descent.

Frequently, the quadratic approximation error is used as cost function:

N

X

1

2

k o i − y i k 2 ,

E =

(6.1)

i =1

where o i is the output of the network when the i th example x i is presented to it, and

y i is the desired output for that example.

In a gradient descent method, a parameter w is modified according to the partial

derivative of the error function E with respect to it:

∆w = − η ∂E

(6.2)

∂w ,

where η > 0 is the learning rate. If η is chosen small enough, the repeated applica-

tion of (6.2) lowers E , until a local minimum is reached.

The simplest example of gradient descent learning is the delta rule. It is applica-

ble for linear networks without hidden units: o ( i ) = wx i . For the weights w j of the

network's output unit, the learning rule (6.2) can be rewritten as:

X

N

( o ( i ) − y ( i ) ) x ( i j ,

∆w j = − η

(6.3)

i =1

where x ( i j is the j th component of the input vector x i .

In order to make the gradient descent idea work, differentiable transfer functions

are needed. In the example above, the transfer function was linear and hence could

be omitted from the analysis. In multi-layered networks, it is necessary to have

nonlinear transfer functions for the hidden units, to make them more powerful than

networks without hidden units. One frequently used non-linear transfer function is

the sigmoid o = f sig ( ξ ) = 1

1+ e − ξ that has already been discussed in Section 4.2.4.

Its derivative can be expressed in terms of its output: f 0

sig ( ξ ) = o (1 − o ) .

For the gradient computation in multi-layered networks, the idea of backprop-

agation of an error signal was introduced e.g. by Rumelhart et al. [200]. The error

backpropagation technique is an efficient method to compute the partial derivative

of a cost function with respect to a weight, in the same way as the ordered update

of the network activity in the feed-forward mode is an efficient method to compute

the network output. As the name suggests, backpropagation visits the nodes of the

network in the opposite order of the feed-forward step.

Hierarchical Neural Networks for Image Interpretation

Search WWH ::

Custom Search

Home