Environmental Engineering Reference
In-Depth Information
TABLE 7.2 List of some training algorithms that are particularly tied with a feedforward network.
Training methods
Training algorithms
Acronym
Brief description
Steepest gradient descent
SGD
It determines the direction and magnitude of the weights by
the derivatives of the classification error.
Gradient descent with
momentum
GDM
This is an advanced gradient descent approach by adding a
momentum item in the SGD weight adjustments formula.
Back-propagation
Resilient propagation
RP
It attempts to speed up the training using an adoptive variable
to define the magnitude of the weight update while the
direction of the weight update is defined by the sign of the
derivatives of the classification error.
Fletcher-Reeves
CGF
It calculates the conjugate of the previous search direction
using the Fletcher-Reeves algorithm and then employs a
line search to determine the optimal size of the weight
update.
Polak-Ribiere
CGP
It is similar to the CGF method, differing only in the conjugate
direction computation. It updates the conjugate direction
using the Polak-Ribiere algorithm.
Conjugate gradient
Powell-Beale
CGB
Using the same learning model used by the CGF algorithm to
compute the conjugate direction, it, however, resets the
search direction to the negative of the gradient.
Scaled conjugate gradient
SCG
It is a variation of the conjugate gradient method with a scaled
step size. It combines the model-trust region approach with
the conjugate gradient approach to scale the step size.
BFGS (Broyden, Fletcher,
Goldfarb, and Shanno)
BFG
This algorithm approximates the Hessian matrix by a function
of the gradient to reduce the computational and storage
requirements.
Quasi-Newton
Levenberg-Marquardt
LM
This method is considered as a combination of gradient
descent and the Gauss-Newton method. It locates the
minimum of a function that is expressed as the sum of
squares of nonlinear functions.
7.2.3.3 Quasi-Newtonmethod
of steepest descent and the Gauss-Newton method, the LM
algorithm locates the minimum of a function that is expressed as
the sum of squares of non-linear functions.
The quasi-Newton method is an improved optimization method
developed from Newton's method that uses standard numeri-
cal optimization techniques to train neural networks (Demuth,
Beale and Hagan, 2008). Although Newton's method can usu-
ally converge faster than the conjugate gradient method, there
aretwomajordrawbacks.First,itrequirescomputing,storing,
and inverting the Hessian matrix, which rapidly increases the
computation complexity and requires a large memory space.
On the other hand, Newton's method may not converge in
non-quadratic error surfaces (Duda, Hart and Stork, 2001). The
quasi-Newton method updates an approximate Hessian matrix
on each iteration, which is a function of the gradient, instead of
the second-order derivatives matrix used by Newton's method.
This update substantially reduces the computational complexity.
When comparing to other methods, however, the quasi-Newton
method is more suitable for small networks with limited number
of weights.
The most successful quasi-Newton method is the Broyden,
Fletcher, Goldfarb, and Shanno (BFGS) algorithm that approxi-
mates the Hessian matrix by a function of the gradient to reduce
the computational and storage requirements. Additional, the
Levenberg-Marquardt (LM) algorithm is considered as a quasi-
Newton method since it also seeks for second-order training
speed by using an approximate Hessian matrix. A combination
7.3 Internal parameters
and classification accuracy
7.3.1 Experimental design
In this focused study, we assessed a set of topological and training
parameters affecting image classification accuracy by the MLP
neural networks. Three of the four parameters controlling the
MLP network topology were considered, including number of
hidden layers, type of activation function, and training thresh-
old; the number of neurons was excluded since prior studies
have found that image classification accuracies were less sensi-
tive to this factor (e.g., Gong, Pu and Chen, 1996; Foody and
Arora, 1997; Paola and Schowengerdt, 1997; Shupe and Marsh,
2004). The GDM algorithm was used to train the MLP net-
works, and three related training parameters were considered,
namely, learning rate, momentum, and number of iterations.
These six internal parameters considered are summarized in
Table 7.3.
Search WWH ::




Custom Search