Information Technology Reference
In-Depth Information
Chapter 8
Text Mining and Patient
Severity Clusters
IntroductIon
The problem with using the diagnosis codes is that there are just too many to be able to use them all in
a predictive model or regression. The requirements of a predictive model are that categorical data have
just a small number of levels; this requirement will lead to the need to compress the number of levels
in the variable. Therefore, thus far, there is a predetermined list of codes that count in risk adjustment,
leaving many codes not included (as in the case of the Charlson Index). Otherwise, consensus panels are
used to determine categories of severity, as in the case of the APRDRG Index. We have shown that in
many cases, some of the omitted codes include as much, if not more, risk compared to those codes that
are included; patients with the omitted conditions will be identified as less severe compared to patients
with included conditions. In this chapter, we will introduce a method that can compress the diagnoses into
clusters while still using all of the codes, without relying upon consensus panels. Moreover, outcomes
are not used to define the severity index, so they can be used to validate the model; outcomes can then
be used to consider the quality of providers.
Perhaps the major reason to use the modeling here is that the methodology described does not require
that the diagnosis codes used are independent as is required for regression models; in fact, the modeling
Search WWH ::




Custom Search