Information Technology Reference
In-Depth Information
Appendix A
A Note on Screening
Regression Equations
DAVID A. FREEDMAN*
Consider developing a regression model in a context where substantive
theory is weak. To focus on an extreme case, suppose that in fact there is
no relationship between the dependent variable and the explanatory vari-
ables. Even so, if there are many explanatory variables, the R 2 will be
high. If explanatory variables with small t statistics are dropped and the
equation refitted, the R 2 will stay high and the overall F will become
highly significant. This is demonstrated by simulation and by asymptotic
calculation.
KEY WORDS: Regression; Screening; R 2 ; F ; Multiple testing.
1. INTRODUCTION
When regression equations are used in empirical work, the ratio of data
points to parameters is often low; furthermore, variables with small coeffi-
cients are often dropped and the equations refitted without them. Some
examples are discussed in Freedman (1981) and Freedman, Rothenberg,
and Sutch (1982, 1983). Such practices can distort the significance levels
of conventional statistical tests. The existence of this effect is well known,
but its magnitude may come as a surprise, even to a hardened statistician.
The object of the present note is to quantify this effect, both through
* David A. Freedman is Professor, Statistics Department, University of California, Berkeley,
CA 94720. This research developed from a project supported by Dr. George Lady, of the
former Office of Analysis Oversight and Access, Energy Information Administration,
Department of Energy, Washington, D.C. I would like to thank David Brillinger, Peter
Guttorp, George Lady, Thomas Permutt, and Thomas Rothenberg for their help.
Reprinted with permission by The American Statistian .
Search WWH ::




Custom Search