Geoscience Reference
In-Depth Information
In supervised learning (also known as learning with a teacher or associative learning ) for each
example or training pattern, there is an associated correct response (also termed teacher signal )
which is known to the CNN output units. In unsupervised learning , there are no pre-specified cor-
rect responses available against which the network can compare its output. Unsupervised learning
is typically based on some variation of Hebbian and/or competitive learning and in most cases
involves the clustering of - or detection of similarities among - unlabeled patterns within a given
training set. The intention in this instance is to optimise some form of comprehensive performance
function or evaluation criterion defined in terms of output activities related to the PEs within the
CNN. In each application, the weights and the outputs of the CNN are expected to converge to rep-
resentations that capture the statistical regularities of the training data.
A wide variety of different learning algorithms are now available for solving both supervised
and unsupervised learning problems, most of which have in fact been designed for specific net-
work architectures. Most of these learning algorithms, especially those intended for supervised
learning in feedforward networks, have their roots based in traditional function-minimisation
procedures that can be classified as being either local or global search heuristics (error minimi-
sation strategies). Learning algorithms are termed local if the computations needed to update
each weight in the CNN can be performed using information that is available on a local basis to
that specific weight. This requirement could, for example, be motivated by a desire to implement
learning algorithms in parallel hardware. Local minimisation algorithms (such as those based
on gradient-descent, conjugate-gradient and quasi-Newton methods) are fast but will often con-
verge to a local minimum (with increased chances of getting a sub-optimal solution). In contrast,
global minimisation algorithms, such as simulated annealing and evolutionary computation,
possess heuristic strategies that will enable them to escape from local minima. However, in all
such cases, these algorithms are weak in either one or other of these two alternatives, that is,
good local search procedures are associated with poor global searching and vice versa. To illus-
trate this point we can look at the use of gradient information. This knowledge is not just useful
but often of prime importance, in all local search procedures - yet such knowledge is not put
to good use in simulated annealing or evolutionary computation. In contrast, gradient-descent
algorithms, with numerous multistart possibilities, are prone to encountering local minima
and will thus often produce sub-optimal solutions, that is, these algorithms are weak in global
search. Designing more efficient algorithms for CNN learning is thus an active research topic
for neurocomputing specialists.
One critical issue for the successful application of a CNN concerns the complex relationship
that exists between learning (training) and generalisation. It is important to stress that the ultimate
goal of network training is not to learn an exact representation of the training data, but rather
to build a model of the underlying process(es) which generated that data, in order to achieve a
good generalisation (out-of-sample performance for the model). One simple method for optimising
the generalisation performance of a neural network model is to control its effective complexity,
with complexity in this case being measured in terms of the number of network parameters. This
problem of finding the optimal complexity for a neural network model - although often considered
crucial for a successful application - has until now been somewhat neglected in most CNN-based
GC applications.
In principle, there are three main approaches that can be used to control overfitting
(see Fischer 2005):
Regularisation techniques, that is, adding an extra term to the error function that is designed
to penalise those mappings which are not smooth
Pruning techniques, that is, start with an oversized network and remove inconsequential
links or nodes using automated procedures (e.g. Fischer et al. 1994; Abrahart et al. 1999)
Cross-validation techniques to determine when to stop training, for example, the early
stopping heuristic that is demonstrated in Fischer and Gopal (1994)
Search WWH ::




Custom Search