Neurocomputing for GeoComputation - GeoComputation

Geoscience Reference

In-Depth Information

In supervised learning (also known as learning with a teacher or associative learning ) for each

example or training pattern, there is an associated correct response (also termed teacher signal )

which is known to the CNN output units. In unsupervised learning , there are no pre-specified cor-

rect responses available against which the network can compare its output. Unsupervised learning

is typically based on some variation of Hebbian and/or competitive learning and in most cases

involves the clustering of - or detection of similarities among - unlabeled patterns within a given

training set. The intention in this instance is to optimise some form of comprehensive performance

function or evaluation criterion defined in terms of output activities related to the PEs within the

CNN. In each application, the weights and the outputs of the CNN are expected to converge to rep-

resentations that capture the statistical regularities of the training data.

A wide variety of different learning algorithms are now available for solving both supervised

and unsupervised learning problems, most of which have in fact been designed for specific net-

work architectures. Most of these learning algorithms, especially those intended for supervised

learning in feedforward networks, have their roots based in traditional function-minimisation

procedures that can be classified as being either local or global search heuristics (error minimi-

sation strategies). Learning algorithms are termed local if the computations needed to update

each weight in the CNN can be performed using information that is available on a local basis to

that specific weight. This requirement could, for example, be motivated by a desire to implement

learning algorithms in parallel hardware. Local minimisation algorithms (such as those based

on gradient-descent, conjugate-gradient and quasi-Newton methods) are fast but will often con-

verge to a local minimum (with increased chances of getting a sub-optimal solution). In contrast,

global minimisation algorithms, such as simulated annealing and evolutionary computation,

possess heuristic strategies that will enable them to escape from local minima. However, in all

such cases, these algorithms are weak in either one or other of these two alternatives, that is,

good local search procedures are associated with poor global searching and vice versa. To illus-

trate this point we can look at the use of gradient information. This knowledge is not just useful

but often of prime importance, in all local search procedures - yet such knowledge is not put

to good use in simulated annealing or evolutionary computation. In contrast, gradient-descent

algorithms, with numerous multistart possibilities, are prone to encountering local minima

and will thus often produce sub-optimal solutions, that is, these algorithms are weak in global

search. Designing more efficient algorithms for CNN learning is thus an active research topic

for neurocomputing specialists.

One critical issue for the successful application of a CNN concerns the complex relationship

that exists between learning (training) and generalisation. It is important to stress that the ultimate

goal of network training is not to learn an exact representation of the training data, but rather

to build a model of the underlying process(es) which generated that data, in order to achieve a

good generalisation (out-of-sample performance for the model). One simple method for optimising

the generalisation performance of a neural network model is to control its effective complexity,

with complexity in this case being measured in terms of the number of network parameters. This

problem of finding the optimal complexity for a neural network model - although often considered

crucial for a successful application - has until now been somewhat neglected in most CNN-based

GC applications.

In principle, there are three main approaches that can be used to control overfitting

(see Fischer 2005):

•

Regularisation techniques, that is, adding an extra term to the error function that is designed

to penalise those mappings which are not smooth

•

Pruning techniques, that is, start with an oversized network and remove inconsequential

links or nodes using automated procedures (e.g. Fischer et al. 1994; Abrahart et al. 1999)

•

Cross-validation techniques to determine when to stop training, for example, the early

stopping heuristic that is demonstrated in Fischer and Gopal (1994)

GeoComputation

Search WWH ::

Custom Search

Home