Information Technology Reference
In-Depth Information
2. Leave-one-out, an extreme example of K -fold, in which we subdi-
vide into as many parts as there are observations. We leave one
observation out of our classification procedure and use the remain-
ing n - 1 observations as a training set. Repeating this procedure
n times, omitting a different observation each time, we arrive at a
figure for the number and percentage of observations classified
correctly. A method that requires this much computation would
have been unthinkable before the advent of inexpensive readily
available high-speed computers. Today, at worst, we need step out
for a cup of coffee while our desktop completes its efforts.
3. Jackknife, an obvious generalization of the leave-one-out approach,
where the number left out can range from one observation to half
the sample.
4. Delete- d , where we set aside a random percentage d of the obser-
vations for validation purposes, use the remaining 100 - d % as a
training set, and then average over 100 to 200 such independent
random samples.
5. The bootstrap, which we have already considered at length in
earlier chapters.
The correct choice among these methods in any given instance is still a
matter of controversy (though any individual statistician will assure you
the matter is quite settled). See, for example, Wu [1986] and the discus-
sion following and Shao and Tu [1995].
Leave-one-out has the advantage of allowing us to study the influence
of specific observations on the overall outcome.
Our own opinion is that if any of the above methods suggest that the
model is unstable, the first step is to redefine the model over a more
restricted range of the various variables. For example, with the data of
Figure 9.3, we would advocate confining attention to observations for
which the predictor (TNFAlpha) was less than 200.
If a more general model is desired, then many additional observations
should be taken in underrepresented ranges. In the cited example, this
would be values of TNFAlpha greater than 300.
MEASURES OF PREDICTIVE SUCCESS
Whatever method of validation is used, we need to have some measure of
the success of the prediction procedure. One possibility is to use the sum
of the losses in the calibration and the validation sample. Even this proce-
dure contains an ambiguity that we need to resolve. Are we more con-
cerned with minimizing the expected loss, the average loss, or the
maximum loss?
One measure of goodness of fit of the model is SSE =S( y i - y * i ) 2 ,
where y i and y* i denote the i th observed value and the corresponding
Search WWH ::




Custom Search