Neural Networks: An Overview - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

Overfitting and the Bias-Variance Dilemma

Since the accuracy of the uniform approximation of a given function by a

neural network increases as the number of hidden neurons increases, a naıve

design methodology would consist in building the network with as many neu-

rons as possible. However, as mentioned above, in real engineering problems,

the network is not required to approximate a known function uniformly, but to

approximate an unknown function (the regression function) from a finite num-

ber of experimental points (the training set); therefore, the network should

not only fit the experimental points as closely as possible (in the least squares

sense), but it should also generalize e ciently, i.e., give a satisfactory re-

sponse to situations that are not present in the training set. The di culty

here is that there is no operational definition of the meaning of satisfactory ,

since the regression function is unknown: the problem of generalization is an

ill-posed problem . Therefore, the design problem is the following:

•

if the neural network has too many parameters (it is said to be over-

parameterized), it will be too “flexible,” so that its output will fit very

accurately all points of the training set (including the noise present in

these points), but it will provide meaningless responses in situations that

are not present in the training set. That is known as overfitting .

•

by contrast, a neural network with too few parameters will not be complex

enough to match the complexity of the (unknown) regression function, so

that it will not be able to learn the training data.

This dilemma, known as the bias-variance dilemma , is the basic problem that

the model designer is faced with.

Figure 1.14 shows the results obtained after training two different net-

works, with different numbers of hidden neurons (hence of parameters) with

sigmoid activation functions, from the same training set: clearly, the most

parsimonious model (i.e., the model with the smallest number of parameters)

generalizes best. In practice, the number of parameters should be small with

respect to the number of elements of the training set. The parsimony of neural

networks with sigmoid activation functions is a valuable asset in the design of

models that do not exhibit overfitting.

Figure 1.14 shows clearly which candidate neural network is most ap-

propriate. When the model has several inputs, the result cannot be exhib-

ited graphically in such a straightforward fashion: a quantitative performance

index must be defined. The most popular way of estimating such an index is

the following: in addition to the training set, one should build a validation

set , made of observations that are distinct from those of the training set, from

which a performance index is computed. The most frequently used criterion

is the mean square error on the validation set (VMSE), defined as:

N V

1

N V

g ( x k , w )] 2

[ y k −

VMSE =

k =1

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home