Text Mining and Patient Severity Clusters - Text Mining Techniques for Healthcare Provider Quality Determination

Information Technology Reference

In-Depth Information

Figure 36. Lift function for revised prediction model

Figure 36 gives the lift function for the testing set. It indicates that MRSA can be predicted for the

first 4 deciles. Therefore, we can now identify the high risk patients in the first four deciles. Moreover,

the patients in the last 3 deciles have a low risk of MRSA and would not require any prophylactic treat-

ment.

future trends

This chapter clearly demonstrates the use of text mining to define patient severity indices. It gives results

that are far superior to those currently in use defined using logistic regression. Future trends will be to

convert more indices to text mining analysis. Because it is not subject to the problems of regression

models, it is not as susceptible to the upcoding by providers (links between codes become more im-

portant than the addition of codes). For this reason, text-based severity measures should take the place

of regression-based measures. Because text mining takes advantage of the linkage between conditions,

this procedure is not concerned with the requirement of independence between codes. Because text is

considered to be somewhat unique, this technique also does not have to make the assumption of the

uniformity of data entry. The entry is not uniform, and we can compare differences in coding across

providers as well as differences across patients.

However, the practice of comparing the observed rate of mortality to the actual rate of mortality should

be reconsidered as well. It penalizes hospitals with very low rates of actual mortality and rewards hospitals

with very high rates of mortality because there can be a higher differential between actual and predicted

rates when the actual rate is high. While it can be very satisfactory to develop a “scorecard” to rank

providers, quality of care is multi-dimensional, and it should be examined in multiple dimensions.

While the National Inpatient Sample, used here, defines the datafile so that the text strings of diag-

nosis and procedure codes are fairly simple to define, we will show how they can be defined using more

complex claims data in the next chapter.

dIscussIon

Text mining concentrates on the linkage between patient diagnoses and procedures, assuming that there

are relationships. This assumption should be quite reasonable-procedures should be related to diagnoses,

and there are co-morbid conditions that are related such as diabetes and heart disease. The relationships

are assumed to be textual rather than linear. Moreover, it is not necessary to assume the uniformity of

data entry when defining a patient severity index using text mining.

An index defined using text mining can be used for purposes other than to rank the quality of provid-

Search WWH ::

Custom Search

Home