Database Reference
In-Depth Information
A classic example of this might be if we were predicting a person's weight (Y),
and two of the variables were the person's height and his/her pant length. Clearly,
each of these variables is a signiicant predictor of weight; nobody can deny that on
average, if a person is taller, he/she weighs more. If we assume that these two X vari-
ables are 99% correlated (to the authors, a reasonable assumption, although we've
never done an actual study!!), the multiple regression results would ind each of these
variables not signiicant! That is because, given that each of the two variables (height,
pant length) are telling us the same thing about a person's weight, neither variable
provides unique (i.e., “above and beyond the other variables”) predictive value, and,
statistically, the result is the correct one.
Obviously, what we really want is to retain one of the two variables in our pre-
dictive equation, but we do not need both variables. If you remove both variables
from the equation, you would be harming yourself with respect to getting the best
prediction of a person's weight that you can. In fact, if these were the only two
variables under consideration, and you drop them both, you would have nothing!!
Stepwise regression deals with this issue and would keep one of these two vari-
ables, whichever one was the tiniest bit more predictive than the other, and would bar
the other variable from being in the equation. The one variable of the two that is in
the equation is clearly signiicant, both statistically and intuitively.
10.6.1 HOW DOES STEPWISE REGRESSION WORK?
As silly as it sounds to say, we are saying it: stepwise regression works in steps!
It picks variables one at a time to enter into the equation. The entire process is
automated by the software.
The irst step is for the software to run a simple regression with Y and each of
the X's available. These regressions are run “internally”—you do not see (nor wish
to see!) that output; the software picks the variable with the highest r 2 value. Then it
displays (as you'll see) the results of this one (“winning”) regression.
In step 2, stepwise regression runs (internally) a bunch of new regressions; each
regression contains two X's, one being the winner from step 1, and every other X.
So, for example, if there are six X's to begin with (X1, X2, X3, X4, X5, X6), step 1
involves six simple regressions. Now let's assume that X3 has the highest r 2 , say,
0.35, and is, thus, considered the “winner.” In step 2, 5 regressions would be run;
they would involve two X's each, and all would include the X3. Ergo, the new step 2
regressions would be Y/(X1 and X3), Y/(X2 and X3), Y/(X4 and X3), Y/(X5 and
X3), and inally, Y/(X6 and X3). Next, which pair of X's together has the highest
r 2 is identiied. Imagine the overall r 2 with X3 and X6 is the highest, say, 0.59. This
two-variable regression would be displayed on the output.
Onward to step 3. Four regressions are run that contain X3 and X6 and each
other variable eligible (X1, X2, X4, and X5). Again, the highest overall r 2 of the four
regressions would be identiied and that variable would enter the equation. And so
forth—the process continues.
This may sound daunting, but don't forget that it is all automated by the software;
one click and you're done! Based on some other features of Stepwise Regression
 
Search WWH ::




Custom Search