Can you relate in multiple ways? Multiple linear regression and stepwise regression - Improving the User Experience through Practical Data Analytics

Database Reference

In-Depth Information

SIDEBAR: HOLD ON! WHAT HAPPENED TO THE HIGHER R SQUARE I HAD

WITH GOOD OLD-FASHIONED MULTIPLE REGRESSION?

We might recall that the value of r 2 was 0.493 when we used all 15 variables, most of which were

not signiicant. Now we have a value of “only” 0.469.

We reiterate what we noted earlier: What you need to realize is that the 0.493 is a bit misleading in

that it includes the sum of a bunch of small values added to the r 2 based on variables that really cannot be

said to add value to predicting Y. When we eliminate all these small “fake” additions to r 2 , we end up with

0.469, or 46.9%. If this sounds subtle, we sympathize, but do not apologize. Regression analysis is a fairly

complex topic and has many subtle areas, most of which, fortunately, you do not have to be concerned with.

Yc=0 . 528+0 . 311*X15+0 . 177*X7+0 . 121*X11+0 . 153*X1+0 . 106*X3

+0 . 106*X2+0 . 055*X6,

or, if we order the variables by subscript,

Yc=528+0 . 153*X1+0 . 106*X2+0 . 106*X3+0 . 055*X6+0 . 177*X7

+0 . 121*X11+0 . 311*X15 .

In other words, this equation says that if we plug in a person's value for X1, X2,

X3, X6, X7, X11, and X15, we get our best prediction for what the person will put

for Y, the likelihood on the 5-point scale that he/she will adopt the search engine. For

example, if we arbitrarily assume a person gives a “4” to each of the seven X's in the

equation, the Yc comes out 4.64. Of course, an individual responder cannot respond

4.64, since the value chosen must be an integer. The right way to think of this is that

if we had a large number of people who answered “4” for each of the variables, the

mean response for the Y, likelihood to adopt the search engine , is predicted to be 4.64.

We want to add one more piece of potentially useful information about interpreting

the bottom portion of Figure 10.16 . You will notice that there is a column (right-hand-

most column shown in the bottom portion of Figure 10.16 ) called “Coeficients Beta.” In

a stepwise regression, where there is relatively little overlap among the X variables in the

equation (remember: the way [and a strength] of how stepwise regression works is that if

there were a lot of overlap between two variables, one of the two variables would not be in

the equation!), the magnitude of these “Beta values” roughly (not exactly, but likely close

enough) relect, in some sense, the relative importance of the variables. Here, the order is:

Ability to perform a Boolean search

Ability to search by skills

Ability to search by job title

Ability to search candidates by companies in which they have worked

Ability to search by location

Ability to search by years of experience

Ability to search candidates by level of education

This order is reasonably close to the order that is considered by many to be the

true order of importance. Of course, these results are based on a sample, and not the

total population, so you should not expect that the order would come out “perfectly.”

Search WWH ::

Custom Search

Home