Parameterizing neural network models to improve land classification performance - Urban Remote Sensing: Monitoring, Synthesis and Modeling in the Urban Environment - page 92

Environmental Engineering Reference

In-Depth Information

TABLE 7.2 List of some training algorithms that are particularly tied with a feedforward network.

Training methods

Training algorithms

Acronym

Brief description

Steepest gradient descent

SGD

It determines the direction and magnitude of the weights by

the derivatives of the classification error.

Gradient descent with

momentum

GDM

This is an advanced gradient descent approach by adding a

momentum item in the SGD weight adjustments formula.

Back-propagation

Resilient propagation

RP

It attempts to speed up the training using an adoptive variable

to define the magnitude of the weight update while the

direction of the weight update is defined by the sign of the

derivatives of the classification error.

Fletcher-Reeves

CGF

It calculates the conjugate of the previous search direction

using the Fletcher-Reeves algorithm and then employs a

line search to determine the optimal size of the weight

update.

Polak-Ribiere

CGP

It is similar to the CGF method, differing only in the conjugate

direction computation. It updates the conjugate direction

using the Polak-Ribiere algorithm.

Conjugate gradient

Powell-Beale

CGB

Using the same learning model used by the CGF algorithm to

compute the conjugate direction, it, however, resets the

search direction to the negative of the gradient.

Scaled conjugate gradient

SCG

It is a variation of the conjugate gradient method with a scaled

step size. It combines the model-trust region approach with

the conjugate gradient approach to scale the step size.

BFGS (Broyden, Fletcher,

Goldfarb, and Shanno)

BFG

This algorithm approximates the Hessian matrix by a function

of the gradient to reduce the computational and storage

requirements.

Quasi-Newton

Levenberg-Marquardt

LM

This method is considered as a combination of gradient

descent and the Gauss-Newton method. It locates the

minimum of a function that is expressed as the sum of

squares of nonlinear functions.

7.2.3.3 Quasi-Newtonmethod

of steepest descent and the Gauss-Newton method, the LM

algorithm locates the minimum of a function that is expressed as

the sum of squares of non-linear functions.

The quasi-Newton method is an improved optimization method

developed from Newton's method that uses standard numeri-

cal optimization techniques to train neural networks (Demuth,

Beale and Hagan, 2008). Although Newton's method can usu-

ally converge faster than the conjugate gradient method, there

aretwomajordrawbacks.First,itrequirescomputing,storing,

and inverting the Hessian matrix, which rapidly increases the

computation complexity and requires a large memory space.

On the other hand, Newton's method may not converge in

non-quadratic error surfaces (Duda, Hart and Stork, 2001). The

quasi-Newton method updates an approximate Hessian matrix

on each iteration, which is a function of the gradient, instead of

the second-order derivatives matrix used by Newton's method.

This update substantially reduces the computational complexity.

When comparing to other methods, however, the quasi-Newton

method is more suitable for small networks with limited number

of weights.

The most successful quasi-Newton method is the Broyden,

Fletcher, Goldfarb, and Shanno (BFGS) algorithm that approxi-

mates the Hessian matrix by a function of the gradient to reduce

the computational and storage requirements. Additional, the

Levenberg-Marquardt (LM) algorithm is considered as a quasi-

Newton method since it also seeks for second-order training

speed by using an approximate Hessian matrix. A combination

7.3 Internal parameters

and classification accuracy

7.3.1 Experimental design

In this focused study, we assessed a set of topological and training

parameters affecting image classification accuracy by the MLP

neural networks. Three of the four parameters controlling the

MLP network topology were considered, including number of

hidden layers, type of activation function, and training thresh-

old; the number of neurons was excluded since prior studies

have found that image classification accuracies were less sensi-

tive to this factor (e.g., Gong, Pu and Chen, 1996; Foody and

Arora, 1997; Paola and Schowengerdt, 1997; Shupe and Marsh,

2004). The GDM algorithm was used to train the MLP net-

works, and three related training parameters were considered,

namely, learning rate, momentum, and number of iterations.

These six internal parameters considered are summarized in

Table 7.3.

Next Page

Urban Remote Sensing: Monitoring, Synthesis and Modeling in the Urban Environment

Search WWH ::

Custom Search

Home