Neural Networks Approach - Computational Intelligence in Time Series Forecasting

Information Technology Reference

In-Depth Information

3.5.4 Training, Stopping and Evaluation

Originally, the simple principle was accepted that the network should be trained

until it has learnt it's task. This is certainly difficult to find out, because there is no

direct approach how to do this. The general statement that a high enough number

of iterations, or training steps, is good enough, in the sense that the network has

learnt well enough to be a qualified expert in a specific domain, say in forecasting,

does not hold. Thus far, at least theoretically, reaching the global minimum of the

objective function is accepted as the training efficiency merit , so that by

approaching this minimum the error function will steadily decrease until the

minimum has been reached. Finding out that there is no further decrease of the

error function would then be an indication to stop the training process.

In practice, to find the global minimum, network training can require a number

of repeated training trials with various initial weight values. After each training run

the training results have to be evaluated and compared with the results achieved in

the previous runs, this in order to select the best run. Some researchers have here

centred their attention on the problem of a priori determination of a maximum

number of training runs required for the training. Iyer and Rhinehart (2000) have

developed an analytical procedure for determining the desirable lower number of

training runs, sufficient - within a certain level of confidence - that the best one is

within them. The procedure is based on the weakest-link-in-the-chain analysis

described by Bethea and Rhinehart (1991).

The authors use the cumulative distribution function for the weakest link in a

set of N training, with runs starting with the random initial weight values

()] N

Fa

() 1 [1

Fa

.

w

x

This, rearranged as

1

Fa

() 1 [1

F a

()] N

,

x

w

represents the probability that any single optimization has an error value

d

The two relations, simultaneously taken, define the required number of random

starts as

x

a

.

ln[1

Fa

( )]

w

N

.

ln[1

Fa

( )]

x

For example, if, at the confidence of 99% level, the best of random starts should

result in one of the best 20% values for the sum of squared errors, then the required

number of random starts will be

ln(1

0.99)

N

#

20

.

ln(1

20)

Computational Intelligence in Time Series Forecasting

Search WWH ::

Custom Search

Home