Information Technology Reference
In-Depth Information
Chapter 10
Multivariable Regression
M ULTIVARIABLE REGRESSION IS PLAGUED BY THE SAME PROBLEMS univariate
regression is heir to, plus many more of its own. Is the model correct? Are
the associations spurious?
In the univariate case, if the errors were not normally distributed, we
could take advantage of permutation methods to obtain exact significance
levels in tests of the coefficients. Exact permutation methods do not exist
in the multivariable case.
When selecting variables to incorporate in a multivariable model, we are
forced to perform repeated tests of hypotheses, so that the resultant p
values are no longer meaningful. One solution, if sufficient data are avail-
able, is to divide the data set into two parts, using the first part to select
variables and using the second part to test these same variables for
significance.
If choosing the correct functional form of a model in a univariate case
presents difficulties, consider that in the case of k variables, there are k
linear terms (should we use logarithms? should we add polynomial terms?)
and k ( k - 1) first-order cross products of the form x i x k . Should we include
any of the k ( k - 1)( k - 2) second-order cross products?
Should we use forward stepwise regression, or backward, or some other
method for selecting variables for inclusion? The order of selection can
result in major differences in the final form of the model (see, for
example, Roy [1958] and Goldberger [1961]).
David Freedman [1983] searched for and found a large and highly sig-
nificant R 2 among totally independent normally distributed random vari-
ables. This article is reproduced in its entirety in Appendix A, and we urge
you to read this material more than once. Freedman demonstrates how
Search WWH ::




Custom Search