Can you relate in multiple ways? Multiple linear regression and stepwise regression - Improving the User Experience through Practical Data Analytics

Database Reference

In-Depth Information

SIDEBAR: NONSIGNIFICANT BUT USEFUL VARIABLES

It is possible that some of the nonsigniicant variables belong in the inal conclusion about impor-

tant variables to help us predict Y. How can that be, if they, indeed, are not signiicant?

The answer is that perhaps two (or more) of the nonsigniicant variables are providing exactly

the same, useful , information. Since each provides the same information, neither of the two vari-

ables adds anything unique , and thus, both variables would show up as not signiicant. Regression

analysis can be pretty subtle!!

With this added complexity, what do we do? We certainly do not want to do a “zillion” regres-

sion analyses among the eight nonsigniicant variables—even if you limited yourself to two at a

time, there would be 28 different regressions. A simple illustration might help explain the paradox

of nonsigniicant variables that are still useful. Imagine three X's in total:

X1 provides information content 1-10 (out of 100 units of information that exist),

X2 provides information content 11-20, and

X3 provides information content 1-20.

Any one of the X's could be left out without harm to the overall units of information we have:

20. If X1 is left out, you still have information content 1-20 in total, provided by X2 and X3 (in

fact, by X3 alone); if, instead, X2 is left off, you still have information content 1-20 in total, pro-

vided by X1 and X3 (in fact, by X3 alone); if X3 is left out, you still have information content 1-20

in total, provided by X1 and X2.

Therefore, the t-test, which evaluates each variable, one at a time, would not ind any of these

three variables useful at all (i.e., none of the three variables adds anything that can't be gotten from

the other two) and thus, will ind all three variables not signiicant. Yet, you want X3, or second

best, X1 and X2, in your equation to maximize the number of units of information you have in the

equation.

Interestingly, you would have a “hint” that something like this might be occurring, because,

in this case, with only the three aforementioned variables, the F -test, the test of the overall model,

would be signiicant , since it would identify that having the three variables is, beyond a reasonable

doubt, better than having no variables at all!!

However, in our 15-variable problem, this help (from the F -test) is not forthcoming, since the

F -test will be signiicant whether the subtlety among a few of the nonsigniicant variables exists or

not; the F -test will be signiicant just due to the existence of “the big 7” variables.

So, what do we do? Well, luckily, there is good news that there is a technique purposely devel-

oped to address this issue, called stepwise regression , which we mentioned before and will discuss

in the next section. However, stepwise regression is not available in the bare-bones Excel. (Several

add-ins to Excel (such as XL Miner) provide it, but the basic Excel does not have the capability to

do it.) If you have SPSS, you're covered.

OK, now let's present our multiple regression analysis using SPSS. The data (irst

18 rows out of the 180 data points) are shown in Figure 10.10 . We already used Vari-

able View to relabel the variables into X's and Y.

The SPSS dialog box for Linear Regression is shown in Figure 10.11 .

The dependent variable is noted as Y. The independent variables are X1 through

X15, even though you can see only X1, X2, and X3. If you scroll down, you would

see all 15 X's. We are ready to click on “OK,” and obtain our output, which is dis-

played in Figure 10.12 . (However, please note in Figure 10.11 the “Method:” Enter

Improving the User Experience through Practical Data Analytics

Search WWH ::

Custom Search

Home