Neural Networks Approach - Computational Intelligence in Time Series Forecasting

Information Technology Reference

In-Depth Information

t

c

¦

Et

()

c

Pt

( )

1000

tr

c ,

tr

t

k

min

E

( )

t

c

tr

with

t

c and the training strip length k .

t

k

1,

x Stop when the generalization error increased in Ȟ successive strips.

Prechelt (1998), in order to interrogate the validity of the criteria, conducted 1296

training runs, producing 18144 stopping criteria. In the experiments, 270 of the

records from 125 different runs reached automatically the 3000 epoch limit without

using stopping criteria.

We will now consider the problem of network overtraining or network

overfitting in more detail. Both the problem of overfitting and the opposite

problem of underfitting arise as a consequence of improper training stopping.

Therefore, both of them should be prevented because each of them lowers the

generalization capability of the trained network. For example, if a network to be

trained is less complex than the task to be learnt, then the network - after being

trained - can suffer from underfitting and can, therefore, poorly identify the

features within a large training data set. On the contrary, a too complex network

can, after being trained, suffer from overfitting and can, therefore, extract the

features within the training set along with the superposed noise. As a consequence,

a complex network can produce predictions that are not acceptable.

Network complexity is primarily related to the number of weights. The term is

used in connection with the model selection for prediction in the sense that the

prediction accuracy of a network determines its complexity. This is the starting

point of network model selection: how many and of what size of weights (and how

many hidden units) should the model have in order to implement the wanted

prediction accuracy without (or at least with a low) overfitting?

From the statistical point of view, the underfitting and overfitting are related to

the statistical bias and the statistical variance they produce. They strongly

influence the generalization capability of the trained network as follows:

x the statistical bias is related to the degree of target function fitting and

restricts the network complexity, but does not care about the trained

network generalization

x statistical variance , which is the deviation of network learning efficiency

within the set of training data, cares about the generalisation of the trained

network.

For instance, underfitting produces a very high bias at network outputs, whereas

overfitting produces a large variance. The difficulty of their simultaneous reduction

or their balancing in the process of learning, which is essential for achieving the

highest possible degree of generalization, is known as the bias-variance dilemma .

The dilemma is to be understood as follows: the bias of a neural network with a

high fitting performance across the given training set of data is very low, but its

variance is very high. By reducing the variance the network data fitting

performance of the network will decrease. As a consequence, a trade-off between

Computational Intelligence in Time Series Forecasting

Search WWH ::

Custom Search

Home