Will anybody buy? Logistic regression - Improving the User Experience through Practical Data Analytics

Database Reference

In-Depth Information

SIDEBAR: JUST MAX ' EM OUT AND BE A PRO: CMAX AND CPRO

CRITERIA—cont'd

Secondly, some people believe that this Cmax criterion is too strict; they argue that a more

realistic criterion is one called “Cpro,” which would be, in our example, (8/12) 2 + (4/12) 2 = 0.56, or

56%. The logic of Cpro is that it is not realistic to predict everybody to be in the highest frequency

category. So, the issue is phrased as follows: “How many would be guessed correctly if you had to

allocate 8 people of the 12 to 0's, and 4 to 1's—that is, insisting that the prediction includes an allo-

cation that duplicates the proportions in the actual data?” It turns out that the answer to this question

is to add up the squares of the two proportions. (It turns out that 75% is also not statistically sig-

niicantly above 56% either, due to the small sample size of 12; however, it's a close case, since the

p -value is about 0.089, not that much above 0.05.) Perhaps, the quickest and easiest guideline we

can give you would be to be happy if (1) for a sample size of at least 25, your results exceed Cmax,

or (2) for a sample size of at least 100, your results beat Cpro by at least 0.08.

The third/bottom section—Variables in the Equation—is extremely useful. It

gives you information about the incremental contribution of each of the predictor

variables, using the Wald test. (It is analogous to the t-test results of regular linear

regression.) Scan down the label Sig. and look for values less than 0.05; these are the

variables that contribute signiicantly to the predictive power of the model above and

beyond the other variables in the model.

In this case, it turns out that “num_courses” is not quite signiicant at the tradi-

tional value of 0.05. The p -value, as you can see, is 0.073 (see arrow in Figure 11.6 ),

a bit over 0.05.

SIDEBAR: THE SHIFTING “CONSTANT” IN SPSS

The authors always ind it interesting that the output has the “Constant” listed below the slope of

the X variable(s); in regular linear regression, as you saw in Chapters 9 and 10, the “Constant” is

listed irst, followed by the slope of the X variable(s). It's not a big deal, but we'd like to ask the

designers at SPSS: Why? For consideration, we offer guideline 11.2 from the “Research-Based

Web Design and Usability Guidelines” from the United States Department of Health and Human

Services:

“Ensure that the format of common items is consistent from one page to another.”

11.4.1 COMPUTING A PREDICTED PROBABILITY

Now, using the results from the irst column of the Variables in the Equation, we can

note the best itting line of the model:

Y*c=−9.776+1.617*X .

(11.6)

We often wish to ind Yc, the predicted probability of obtaining a “1” for various

X values. So, if we take the irst X data value of 8, and plug it into Eqn (11.6) above,

we get

Search WWH ::

Custom Search

Home