The Charlson Comorbidity Index - Text Mining Techniques for Healthcare Provider Quality Determination

Information Technology Reference

In-Depth Information

IF 73.5 <= Age in years at admission < 80.5

AND Indicator of sex EQUALS 1

AND charlson < 0.5

THEN

NODE: 60

N: 2030

1: 46.3%

0: 53.7%

IF 80.5 <= Age in years at admission < 86.5

AND Indicator of sex EQUALS 1

AND charlson < 0.5

THEN

NODE: 61

N: 2038

1: 54.4%

0: 45.6%

IF 0.5 <= charlson < 1.5

AND 57.5 <= Age in years at admission < 70.5

THEN

NODE: 64

N: 5913

1: 48.4%

0: 51.6%

IF 1.5 <= charlson < 2.5

AND 57.5 <= Age in years at admission < 70.5

THEN

NODE: 65

N: 4368

1: 57.2%

0: 42.8%

When testing for a rare occurrence, it is better to use a random sample of the non-occurrences, although

this is rarely done.(D'Hoore, Bouckaert, & Tilquin, 1996) Logistic regression works most effectively

when the group sizes are fairly equal. In addition, given the size of the dataset, we can split the data into

a training set and a holdout sample. Table 9 gives the results on the holdout sample using a 50/50 split

of mortality to non-mortality. Note the difference in accuracy compared to that given in Table 8.

With this split, the model accurately predicts a much higher percentage of the actual mortality in the

dataset. As the disparity between the groups increases, the accuracy of predicting actual mortality de-

creases. Consider the holdout sample with mortality decreased to 25% of the whole; the actual mortality

predicted is less than 6% of the 25% (Table 10).

When mortality is reduced to 10%, the prediction of actual mortality decreases to less than 0.2%

(Table 11).

Without considering the fact that mortality is a rare occurrence, the results will look exceptionally

good.(D'Hoore et al., 1996) However, such results are very misleading. Similarly, if we look at the pre-

Text Mining Techniques for Healthcare Provider Quality Determination

Search WWH ::

Custom Search

Home