Validation - Common Errors in Statistics

Information Technology Reference

In-Depth Information

2. Leave-one-out, an extreme example of K -fold, in which we subdi-

vide into as many parts as there are observations. We leave one

observation out of our classification procedure and use the remain-

ing n - 1 observations as a training set. Repeating this procedure

n times, omitting a different observation each time, we arrive at a

figure for the number and percentage of observations classified

correctly. A method that requires this much computation would

have been unthinkable before the advent of inexpensive readily

available high-speed computers. Today, at worst, we need step out

for a cup of coffee while our desktop completes its efforts.

3. Jackknife, an obvious generalization of the leave-one-out approach,

where the number left out can range from one observation to half

the sample.

4. Delete- d , where we set aside a random percentage d of the obser-

vations for validation purposes, use the remaining 100 - d % as a

training set, and then average over 100 to 200 such independent

random samples.

5. The bootstrap, which we have already considered at length in

earlier chapters.

The correct choice among these methods in any given instance is still a

matter of controversy (though any individual statistician will assure you

the matter is quite settled). See, for example, Wu [1986] and the discus-

sion following and Shao and Tu [1995].

Leave-one-out has the advantage of allowing us to study the influence

of specific observations on the overall outcome.

Our own opinion is that if any of the above methods suggest that the

model is unstable, the first step is to redefine the model over a more

restricted range of the various variables. For example, with the data of

Figure 9.3, we would advocate confining attention to observations for

which the predictor (TNFAlpha) was less than 200.

If a more general model is desired, then many additional observations

should be taken in underrepresented ranges. In the cited example, this

would be values of TNFAlpha greater than 300.

MEASURES OF PREDICTIVE SUCCESS

Whatever method of validation is used, we need to have some measure of

the success of the prediction procedure. One possibility is to use the sum

of the losses in the calibration and the validation sample. Even this proce-

dure contains an ambiguity that we need to resolve. Are we more con-

cerned with minimizing the expected loss, the average loss, or the

maximum loss?

One measure of goodness of fit of the model is SSE =S( y i - y * i ) 2 ,

where y i and y* i denote the i th observed value and the corresponding

Search WWH ::

Custom Search

Home