Database Reference
In-Depth Information
SIDEBAR: NONSIGNIFICANT BUT USEFUL VARIABLES
It is possible that some of the nonsigniicant variables belong in the inal conclusion about impor-
tant variables to help us predict Y. How can that be, if they, indeed, are not signiicant?
The answer is that perhaps two (or more) of the nonsigniicant variables are providing exactly
the same, useful , information. Since each provides the same information, neither of the two vari-
ables adds anything unique , and thus, both variables would show up as not signiicant. Regression
analysis can be pretty subtle!!
With this added complexity, what do we do? We certainly do not want to do a “zillion” regres-
sion analyses among the eight nonsigniicant variables—even if you limited yourself to two at a
time, there would be 28 different regressions. A simple illustration might help explain the paradox
of nonsigniicant variables that are still useful. Imagine three X's in total:
X1 provides information content 1-10 (out of 100 units of information that exist),
X2 provides information content 11-20, and
X3 provides information content 1-20.
Any one of the X's could be left out without harm to the overall units of information we have:
20. If X1 is left out, you still have information content 1-20 in total, provided by X2 and X3 (in
fact, by X3 alone); if, instead, X2 is left off, you still have information content 1-20 in total, pro-
vided by X1 and X3 (in fact, by X3 alone); if X3 is left out, you still have information content 1-20
in total, provided by X1 and X2.
Therefore, the t-test, which evaluates each variable, one at a time, would not ind any of these
three variables useful at all (i.e., none of the three variables adds anything that can't be gotten from
the other two) and thus, will ind all three variables not signiicant. Yet, you want X3, or second
best, X1 and X2, in your equation to maximize the number of units of information you have in the
equation.
Interestingly, you would have a “hint” that something like this might be occurring, because,
in this case, with only the three aforementioned variables, the F -test, the test of the overall model,
would be signiicant , since it would identify that having the three variables is, beyond a reasonable
doubt, better than having no variables at all!!
However, in our 15-variable problem, this help (from the F -test) is not forthcoming, since the
F -test will be signiicant whether the subtlety among a few of the nonsigniicant variables exists or
not; the F -test will be signiicant just due to the existence of “the big 7” variables.
So, what do we do? Well, luckily, there is good news that there is a technique purposely devel-
oped to address this issue, called stepwise regression , which we mentioned before and will discuss
in the next section. However, stepwise regression is not available in the bare-bones Excel. (Several
add-ins to Excel (such as XL Miner) provide it, but the basic Excel does not have the capability to
do it.) If you have SPSS, you're covered.
OK, now let's present our multiple regression analysis using SPSS. The data (irst
18 rows out of the 180 data points) are shown in Figure 10.10 . We already used Vari-
able View to relabel the variables into X's and Y.
The SPSS dialog box for Linear Regression is shown in Figure 10.11 .
The dependent variable is noted as Y. The independent variables are X1 through
X15, even though you can see only X1, X2, and X3. If you scroll down, you would
see all 15 X's. We are ready to click on “OK,” and obtain our output, which is dis-
played in Figure 10.12 . (However, please note in Figure 10.11 the “Method:” Enter
 
Search WWH ::




Custom Search