Information Technology Reference
In-Depth Information
IF 73.5 <= Age in years at admission < 80.5
AND Indicator of sex EQUALS 1
AND charlson < 0.5
THEN
NODE: 60
N: 2030
1: 46.3%
0: 53.7%
IF 80.5 <= Age in years at admission < 86.5
AND Indicator of sex EQUALS 1
AND charlson < 0.5
THEN
NODE: 61
N: 2038
1: 54.4%
0: 45.6%
IF 0.5 <= charlson < 1.5
AND 57.5 <= Age in years at admission < 70.5
THEN
NODE: 64
N: 5913
1: 48.4%
0: 51.6%
IF 1.5 <= charlson < 2.5
AND 57.5 <= Age in years at admission < 70.5
THEN
NODE: 65
N: 4368
1: 57.2%
0: 42.8%
When testing for a rare occurrence, it is better to use a random sample of the non-occurrences, although
this is rarely done.(D'Hoore, Bouckaert, & Tilquin, 1996) Logistic regression works most effectively
when the group sizes are fairly equal. In addition, given the size of the dataset, we can split the data into
a training set and a holdout sample. Table 9 gives the results on the holdout sample using a 50/50 split
of mortality to non-mortality. Note the difference in accuracy compared to that given in Table 8.
With this split, the model accurately predicts a much higher percentage of the actual mortality in the
dataset. As the disparity between the groups increases, the accuracy of predicting actual mortality de-
creases. Consider the holdout sample with mortality decreased to 25% of the whole; the actual mortality
predicted is less than 6% of the 25% (Table 10).
When mortality is reduced to 10%, the prediction of actual mortality decreases to less than 0.2%
(Table 11).
Without considering the fact that mortality is a rare occurrence, the results will look exceptionally
good.(D'Hoore et al., 1996) However, such results are very misleading. Similarly, if we look at the pre-
Search WWH ::




Custom Search