Will anybody buy? Logistic regression - Improving the User Experience through Practical Data Analytics

Database Reference

In-Depth Information

SIDEBAR: THE BINARY LOGISTIC REGRESSION MODEL

Let P(E) stand for the probability that an event, E, occurs. The odds of event E occurring is

deined as

P(E)/(1−P(E)).

The odds is a numerical value that ranges from 0 to ∞ (ininity). The natural log (LN) of

the odds

LN{P(E)/(1−P(E))}

ranges from −∞ to ∞, the same range that holds for a value of Yc in “regular” linear regression in

previous chapters.

We specify that

LN{P(E)/(1−P(E))}=a+b*X. (11.1)

In binary logistic regression, with just one X variable (assumed in this discussion for ease of

explanation), a predicted P(E) is actually Yc. If we replace P(E) in Eqn (11.1) with Yc, we have

LN{Yc/(1−Yc)}=a+b*X,

(11.2)

and if we work backward from Eqn (11.2) , we arrive at the ugly-looking expression:

Yc=e a + b * X /(1+e a + b * X ). (11.3)

However, as ugly as the expression in Eqn (11.3) is, Yc is easily able to be computed for any X

we have, once we determine the values of “a” and “b.”

You can note that Yc in Eqn (11.3) is a value from 0 to 1, exactly what is appropriate for a

probability . When (a + b * X) is very negative, e a + b * X is very near 0, and Yc is near 0, since we have

what is, in loose terms, Yc = (0/(1 + 0)) = 0. When (a + b * X) is very large and positive, e a + b * X is a

very high number, say, 1,000,000, and we have, in loose terms, Yc = (1,000,000/1,000,001), a value

close to 1.

If we let

Y*c=LN{Yc/(1−Yc)},

(11.4)

we have a familiar-looking expression,

Y*c=a+b*X. (11.5)

Of course, when we ind Y*c in Eqn (11.5) , we can then compute Yc by Eqn (11.4) , or better,

have the software do it for us!!

So, in essence, we have a linear regression equation,

Y*c=a+b*X,

or, if there are several X's,

Y*c=a+b1*X1+b2*X2+b3*X3+ ....

But, this time, we cannot ind the “a” and the “b” using the least-squares criterion, as

we did in Chapters 9 and 10. Instead, we need to use a different method, called “maxi-

mum likelihood estimation.” While the criterion of least squares chooses the values of

“a” and “b” that minimize the sum of squared differences between the actual Y and pre-

dicted Y, Yc, the criterion of maximum likelihood estimation, inds values of “a” and “b”

that maximize the probability of obtaining the sample data you actually have . The good

Search WWH ::

Custom Search

Home