Supervised Learning - Hierarchical Neural Networks for Image Interpretation

Information Technology Reference

In-Depth Information

The weight is modified only if successive gradients do not change the direction:

∇ ij E ( t− 1) ·∇ ij E ( t ) ≥ 0 . The step size is modified according to:

: η + · ∆ ( t− 1)

, if ∇ ij E ( t− 1) ·∇ ij E ( t ) > 0

∆ ( t )

η − · ∆ ( t− 1)

, if ∇ ij E ( t− 1) ·∇ ij E ( t ) < 0

(6.10)

∆ ( t− 1)

, else

It is increased by a factor η + if successive updates go in the same direction and

decreased by multiplying it with η −

if the gradient direction changes. In the latter

case, ∇ ij E ( t ) is set to zero to avoid a step size update in the next iteration of the

algorithm. The factors comply with 0 < η − < 1 < η + . Recommended values are

η −

= 0 . 5 for fast deceleration and η + = 1 . 2 for cautious acceleration of learning.

The step sizes are initialized uniformly to ∆ o . It is ensured that they do not leave

the interval [ ∆ min ,∆ max ] . The RPROP algorithm has been shown to be robust to the

choice of its parameters [107]. It is easy to implement and requires only the storage

of two additional quantities per weight: the last gradient and the step size.

Mini Batches. RPROP as well as other advanced optimization techniques are batch

methods because they need an accurate estimate of the gradient. For real-world

tasks, the training set is frequently large and contains many similar examples. In

this case, it is very expensive to consider all training examples before updating the

weights.

One idea to overcome this difficulty is to only use subsets of the training set, so-

called mini batches, to estimate the gradient. This is a compromise between batch

training and online learning. The easiest way to implement mini batches is to update

every n examples. Another possibility is to work with randomly chosen subsets of

the training set. Møller [162] investigated the effect of training with mini batches.

He proposed to start with a small set and to enlarge it as the training proceeds.

For the RPROP algorithm, the gradient estimate must not only be accurate but

stable as well. Because the signs of successive gradient evaluations determine the

adaptation of the learning rates, fluctuations of the gradient estimate may slow down

the convergence.

For this reason, I used RPROP with slowly changing random subsets of the train-

ing set. A small working set of examples is initialized randomly. In each iteration

only a small fraction of the set is replaced by new randomly chosen examples. As

training proceeds, the network error for most of the examples will be low. Hence, the

size of the working set must be increased to include enough informative examples.

The last few iterations of the learning algorithm are done with the entire training set.

Such an approach can speed up the training significantly since most iterations of

the learning algorithm can be done with only a small fraction of the training set. In

addition, the ability of the network to learn the task can be judged quickly because

during the first iterations, the working set is still small.

Hierarchical Neural Networks for Image Interpretation

Search WWH ::

Custom Search

Home