Database Reference
In-Depth Information
because these R -square values are not the same as the r 2 we are accustomed to from
Chapters 9 and 10. In fact, they are usually referred to as “pseudo- R -square measures.”
The second section of the output in Figure 11.6 is called a Classiication Table ,
which is often the most important and useful part of the output. This provides us with
an indication of how well the model is able to predict the correct category for each
case. Put another way, it tells you how well the X (or, multiple X's) predicts whether
the Y is a 1 or a 0.
Speciically, the rows in the Classiication Table tell us the actual number of 1's
and 0's in the data (which, of course, we know already), while the columns tell you
what the regression process predicts is the case.
In our example, there are eight actual observed 0's (the sum of the “7” and the “1”
in the top row), and seven of them are predicted as 0's—so 87.5% (seven of eight) of
the 0's are, indeed, predicted as 0's. There are four actual observed 1's (the sum of
the “2” and “2” in the second row), but only two of them are predicted as 1's—50%
are predicted correctly. Overall, as you see in the bottom row of the table, we predict
correctly 75% of the (in this case) 12 data points.
SIDEBAR: THE CUTOFF POINT
By the way, unless we change a setting (and we do not suggest you do that), if the predicted prob-
ability of a 1 is at least 0.5, the software predicts/classiies the result as a “1,” while if the predicted
probability is less than 0.5, the software classiies the result as a “0.” In fact, this 0.5 “cutoff point”
is noted right below the classiication table.
Should predicting 75% of the cases correctly be considered “good”? Of course,
as a practical matter, it depends on the real-world situation. However, in the abstract,
statistically speaking, we can reason this way: If you were guessing whether each of
the 12 participants successfully completed the task, given no information at all about
these people (just a code number!!), how many would you guess correctly? 75%?
We doubt it!! So, are you going to take your chances or use logistic regression? Of
course, it's a rhetorical question.
SIDEBAR: JUST MAX ' EM OUT AND BE A PRO: CMAX AND CPRO
CRITERIA
Well, you can get 8 of the 12 (67%) correct by predicting all 12 people to be unsuccessful (i.e., 0's).
This is called the “Cmax” criterion, and you cannot guarantee a higher percentage of correct predic-
tions by using any other strategy. The strategy is to predict everyone to be in the category with the
highest frequency!! If the data had consisted of 25 people, for example, and 15 had been “1's,” with 10
being 0's, then the highest frequency category would be 1's, and you could guarantee 60% correctly
predicted (100*(15/25)). The binary logistic regression process resulted in 75% predicted correctly. It's
always nice if the regression results predict a higher percent correct than the Cmax criterion!
However, there are some cautions that need to be mentioned. First, with a sample size of only
12, it turns out that the 75% is really not statistically signiicantly higher than 67%, so the fact that
75% exceeds 67% is not that impressive—primarily due to the small sample size; these results
based on a sample size of 120 (instead of 12) would be statistically signiicant at the traditional 0.05
signiicance level.
 
Search WWH ::




Custom Search