Prediction Algorithms for Data Mining - Visual Data Mining: The VisMiner Approach - page 94

Databases Reference

In-Depth Information

Y

Y

Y

X

X

X

(c)

(a)

(b)

Figure 4.7 Regression Model Overfitting

To avoid overfitting, the training process needs to be stopped before reaching

an overfit state. For example, when training the dataset of Figure 4.7a,

the model would very likely progress through a state similar to Figure 4.7b

before reaching the state of Figure 4.7c. The trick then is to recognize when the

model has reached a point similar to Figure 4.7b and stop. Note that the value

of r

2 for Figure 4.7c, if computed, would be significantly better than that for

Figure 4.7b.

One way to recognize the point at which overfitting begins is to first split the

dataset into a subset of training observations and a subset of validation

observations. Start the model building process using the training dataset. At

regular intervals, as the training progresses, pause to compute a pair of model

performance measures - one using the training set and the other based on the

validation set. For example, in a classification model compute the error rate.

Compute the measure once using the training set and a second time using the

validation set. When the training first starts, you are likely to see the perform-

ance measure improve for both datasets. However, as training progresses you

may see the performance measure improve with respect to the training dataset,

while the validation measure gets worse. When this happens, the model is very

likely to be overfit.

Moving beyond local optima

One of the problems encountered in the development of early artificial neural

networks was that of local optima. The initial neuron weights are random

values. As the neural network is training, it takes small steps (weight adjust-

ments) in a direction that will produce the best improvement. These steps may

move it toward a solution that is not necessarily the best possible. To illustrate,

consider the maps of Figure 4.8. Although this is not a perfect representation of

an ANN search space, it sufficiently depicts the process.

Think of Figure 4.8a as representing the search space of ANN weights where

the darker areas represent regions of better performance. The optimal location is

marked “Best”. The random starting position is marked by the “X”. At each

Next Page

Visual Data Mining: The VisMiner Approach

Search WWH ::

Custom Search

Home