Information Technology Reference
In-Depth Information
Table 5. Misclassification rate for a 25% sample
Target
Outcome
Target Percentage
Outcome Percentage
Count
Total Percentage
Training Data
0
0
80.4
96.6
10070
72.5
1
0
19.6
70.9
2462
17.7
0
1
25.6
3.3
348
2.5
1
1
74.4
29.1
1010
7.3
Validation Data
0
0
80.2
97.1
7584
72.8
1
0
19.8
71.7
1870
17.9
0
1
23.7
2.9
229
2.2
1
1
76.2
28.2
735
7.0
The
Cumulative target density is the target density computed over the irst n deciles.
The
lift for a given decile is the ratio of the target density for the decile to the target density over
all the test data.
The
Cumulative lift for a given decile is the ratio of the cumulative target density to the target
density over all the test data.
Given a lift function, we can decide on a decile cutpoint so that we can predict the high risk patients
above the cutpoint, and predict the low risk patients below a second cutpoint, while failing to make a
definite prediction for those in the center. In that way, we can dismiss those who have no risk, and ag-
gressively treat those at highest risk. Lift allows us to distinguish between patients without assuming a
uniformity of risk. Figure 15 shows the lift for the testing set when we use just the three input variables
of pneumonia, septicemia, and immune disorder.
Random chance is indicated by the lift value of 1.0; values that are higher than 1.0 indicate that the
observations are more predictable compared to random chance. In this example, 40% of the patient
Table 6. Misclassification rate for a 10% sample
Target
Outcome
Target Percentage
Outcome Percentage
Count
Total Percentage
Training Data
0
0
91.5
99.3
31030
89.4
1
0
8.5
83.5
2899
8.3
0
1
27.3
0.7
216
0.6
1
1
72.6
16.5
574
1.6
Validation Data
0
0
91.5
99.2
23265
89.3
1
0
8.4
82.4
2148
8.2
0
1
27.8
0.7
176
0.7
1
1
72.2
17.5
457
1.7
 
Search WWH ::




Custom Search