Information Technology Reference
In-Depth Information
t
t
c
¦
Et
()
c
Pt
( )
1000
tr
c ,
tr
t
t
k
min
E
( )
t
c
tr
with
t
c and the training strip length k .
t
k
1,
x Stop when the generalization error increased in Ȟ successive strips.
Prechelt (1998), in order to interrogate the validity of the criteria, conducted 1296
training runs, producing 18144 stopping criteria. In the experiments, 270 of the
records from 125 different runs reached automatically the 3000 epoch limit without
using stopping criteria.
We will now consider the problem of network overtraining or network
overfitting in more detail. Both the problem of overfitting and the opposite
problem of underfitting arise as a consequence of improper training stopping.
Therefore, both of them should be prevented because each of them lowers the
generalization capability of the trained network. For example, if a network to be
trained is less complex than the task to be learnt, then the network - after being
trained - can suffer from underfitting and can, therefore, poorly identify the
features within a large training data set. On the contrary, a too complex network
can, after being trained, suffer from overfitting and can, therefore, extract the
features within the training set along with the superposed noise. As a consequence,
a complex network can produce predictions that are not acceptable.
Network complexity is primarily related to the number of weights. The term is
used in connection with the model selection for prediction in the sense that the
prediction accuracy of a network determines its complexity. This is the starting
point of network model selection: how many and of what size of weights (and how
many hidden units) should the model have in order to implement the wanted
prediction accuracy without (or at least with a low) overfitting?
From the statistical point of view, the underfitting and overfitting are related to
the statistical bias and the statistical variance they produce. They strongly
influence the generalization capability of the trained network as follows:
x the statistical bias is related to the degree of target function fitting and
restricts the network complexity, but does not care about the trained
network generalization
x statistical variance , which is the deviation of network learning efficiency
within the set of training data, cares about the generalisation of the trained
network.
For instance, underfitting produces a very high bias at network outputs, whereas
overfitting produces a large variance. The difficulty of their simultaneous reduction
or their balancing in the process of learning, which is essential for achieving the
highest possible degree of generalization, is known as the bias-variance dilemma .
The dilemma is to be understood as follows: the bias of a neural network with a
high fitting performance across the given training set of data is very low, but its
variance is very high. By reducing the variance the network data fitting
performance of the network will decrease. As a consequence, a trade-off between
Search WWH ::




Custom Search