Information Technology Reference
In-Depth Information
Covariate
Group
Degrees of
Freedom
Wald
Statistic
p-value
RC
6
41.2
< .0001
CC
16
40.9
< .01
BC
1
7.9
< .01
JT
9
43.7
< .0001
q79
6
84.8
< .0001
q82a
6
56.5
< .0001
q82b
6
34.4
< .0001
q82d
6
34.8
< .0001
q82f
6
39.9
< .0001
Table 3. Statistical Significance of Covariate Groups
Actual
Y
Predicted Y
Total
1
2
3
4
5
1
3
2
7
4
4
20
2
3
8
48
22
7
88
3
2
3
342
486
133
966
4
0
0
126
1233
723
2082
5
0
0
39
705
1156
1900
Total
8
13
562
2450
2023
5056
Table 4. Confusion Matrix of Multinomial Logistic Regression Model
A perfect model would have a confusion matrix that is diagonal indicating the predicted
value for each customer coincided identically with the true value. Consider the rows of
Table 4 corresponding to Y=4 and Y=5. These two rows account for almost 80% of the
customers in the sample. It can be seen that in both cases, the predicted value coincides with
the actual value about 60% of the time. Neither of these two cases predicts Y=1 or Y=2, and
only 4% of the time is Y=3 predicted. The mean values of the predicted Y when Y=4 and
Y=5 are 4.28 and 4.59, respectively. The 7% positive bias for the case Y=4 is roughly offset
by the 11.8% negative bias for the case Y=5.
Looking at the row of Table 4 corresponding to Y=3, we see that 86% of the time the
predicted Y is within 1 of the actual Y. The mean value of the predicted Y is 3.77, indicating
a 26% positive bias. Considering the rows corresponding to Y=1 and Y=2, where only about
2% of the customers reside, we see the model struggles to make accurate predictions, often
over-estimating the actual value of Y. A hint as to the explanation for the noticeable over-
estimation associated with the Y=1, Y=2 and Y=3 customers is revealed by examining their
responses to the covariate questions. As just one example, the respective mean scores on
question q79 (“Overall satisfaction with the service event”) are 3.8, 4.1 and 5.2. It seems a
relatively large number of customers that give a low response to Y are inclined to
simultaneously give favorable responses to the covariate questions on the survey. Although
this might be unexpected, it can possibly be explained by the fact that the covariate
questions are relevant to the most recent service event whereas Y is based on a customer's
cumulative experience.
Overall, Table 4 reflects significant lift afforded by the multinomial logistic regression model
for predicting Y. For example, a model that utilized no covariate information would have a
Search WWH ::




Custom Search