Databases Reference
In-Depth Information
Y
Y
Y
X
X
X
(c)
(a)
(b)
Figure 4.7 Regression Model Overfitting
To avoid overfitting, the training process needs to be stopped before reaching
an overfit state. For example, when training the dataset of Figure 4.7a,
the model would very likely progress through a state similar to Figure 4.7b
before reaching the state of Figure 4.7c. The trick then is to recognize when the
model has reached a point similar to Figure 4.7b and stop. Note that the value
of r
2 for Figure 4.7c, if computed, would be significantly better than that for
Figure 4.7b.
One way to recognize the point at which overfitting begins is to first split the
dataset into a subset of training observations and a subset of validation
observations. Start the model building process using the training dataset. At
regular intervals, as the training progresses, pause to compute a pair of model
performance measures - one using the training set and the other based on the
validation set. For example, in a classification model compute the error rate.
Compute the measure once using the training set and a second time using the
validation set. When the training first starts, you are likely to see the perform-
ance measure improve for both datasets. However, as training progresses you
may see the performance measure improve with respect to the training dataset,
while the validation measure gets worse. When this happens, the model is very
likely to be overfit.
Moving beyond local optima
One of the problems encountered in the development of early artificial neural
networks was that of local optima. The initial neuron weights are random
values. As the neural network is training, it takes small steps (weight adjust-
ments) in a direction that will produce the best improvement. These steps may
move it toward a solution that is not necessarily the best possible. To illustrate,
consider the maps of Figure 4.8. Although this is not a perfect representation of
an ANN search space, it sufficiently depicts the process.
Think of Figure 4.8a as representing the search space of ANN weights where
the darker areas represent regions of better performance. The optimal location is
marked “Best”. The random starting position is marked by the “X”. At each
 
Search WWH ::




Custom Search