Information Technology Reference
In-Depth Information
3.5.4 Training, Stopping and Evaluation
Originally, the simple principle was accepted that the network should be trained
until it has learnt it's task. This is certainly difficult to find out, because there is no
direct approach how to do this. The general statement that a high enough number
of iterations, or training steps, is good enough, in the sense that the network has
learnt well enough to be a qualified expert in a specific domain, say in forecasting,
does not hold. Thus far, at least theoretically, reaching the global minimum of the
objective function is accepted as the training efficiency merit , so that by
approaching this minimum the error function will steadily decrease until the
minimum has been reached. Finding out that there is no further decrease of the
error function would then be an indication to stop the training process.
In practice, to find the global minimum, network training can require a number
of repeated training trials with various initial weight values. After each training run
the training results have to be evaluated and compared with the results achieved in
the previous runs, this in order to select the best run. Some researchers have here
centred their attention on the problem of a priori determination of a maximum
number of training runs required for the training. Iyer and Rhinehart (2000) have
developed an analytical procedure for determining the desirable lower number of
training runs, sufficient - within a certain level of confidence - that the best one is
within them. The procedure is based on the weakest-link-in-the-chain analysis
described by Bethea and Rhinehart (1991).
The authors use the cumulative distribution function for the weakest link in a
set of N training, with runs starting with the random initial weight values
()] N
Fa
() 1 [1
Fa
.
w
x
This, rearranged as
1
Fa
() 1 [1
F a
()] N
,
x
w
represents the probability that any single optimization has an error value
d
The two relations, simultaneously taken, define the required number of random
starts as
x
a
.
ln[1
Fa
( )]
w
N
.
ln[1
Fa
( )]
x
For example, if, at the confidence of 99% level, the best of random starts should
result in one of the best 20% values for the sum of squared errors, then the required
number of random starts will be
ln(1
0.99)
N
#
20
.
ln(1
20)
 
Search WWH ::




Custom Search