Information Technology Reference
In-Depth Information
Chapter 11
Validation
“. . . the simple idea of splitting a sample in two and then devel-
oping the hypothesis on the basis of one part and testing it on the
remainder may perhaps be said to be one of the most seriously
neglected ideas in statistics. If we measure the degree of neglect by
the ratio of the number of cases where a method could help to the
number of cases where it is actually used.” G. A. Barnard in
discussion following Stone [1974, p. 133].
Validate your models before drawing conclusions.
A S WE READ IN THE ARTICLES BY DAVID Freedman and Gail Gong reprinted
in the Appendix absent a detailed knowledge of causal mechanisms, the
results of a regression analysis are highly suspect. Freedman found highly
significant correlations between totally independent variables. Gong resam-
pled repeatedly from the data in hand and obtained a different set of sig-
nificant variables each time.
A host of advertisements from new proprietary software claim an ability
to uncover relationships previously hidden and to overcome the deficien-
cies of linear regression. But how can we determine whether or not such
claims are true?
Good [2001a, Chapter 10] reports on one such claim from the maker
of PolyAnalyst TM . He took the 400 records, each of 31 variables, PolyAna-
lyst provided in an example dataset, split the data in half at random, and
obtained completely discordant results with the two halves whether they
were analyzed with PolyAnalyst, CART, or stepwise linear regression. This
was yet another example of a spurious relationship that did not survive the
validation process.
In this chapter we review the various methods of validation and provide
guidelines for their application.
Search WWH ::




Custom Search