Information Technology Reference
In-Depth Information
diagnoses and procedures associated with MRSA. There were a total of 5974 patients with a diagnosis
of v09.0, or resistant infection, in the dataset. Code translations are also given.(Anonymous-ICD9)
Approximately 2/3 of the patients diagnosed with resistant infection are also diagnosed with Staphy-
lococcus aureus, indicating MRSA; the remaining have other diagnoses, or none at all. Hypertension is
identified in 27% of the patients followed by 17% with a urinary tract infection. Therefore, all but one
of the 20 top diagnoses only occur in just a small proportion of patients; they can all be considered rare
occurences. The procedures occur in an even smaller proportion of the patients, with 28% receiving ve-
nous catheterization followed by other incision with drainage of skin and subcutaneous tissue in 13%.
Since each individual diagnosis or procedure occurs in just a small proportion of patients, it is impor-
tant to consider all possible combinations of diagnoses and procedures in order to identify all patients.
Without some type of data compression to reduce these codes to a reasonable number of categories,
they cannot be used in any predictive model. We use text mining to perform this task. Text mining can
use the linkage between diagnoses and procedures to define categories of patients.(P. Cerrito, 2007; P.
B. Cerrito, 2007) Again, in order to use text mining, we concatenate the fifteen columns of diagnosis
codes into one text string, and then also concatenate the fifteen columns of procedure codes into a second
text string. Then, we use the linkage in the text string between the concatenated codes to define a set of
clusters of patient conditions, and a second set of clusters of procedures.
Another problem is that without some type of stratified sampling in the data, predictive models such
as logistic regression cannot be used to predict the rare occurrence of infection. The constructed model
will simply predict all patients, or nearly all patients in the non-occurrence group so that the model is
accurate, but lacking in value. We use a stratified sample of 100% of the occurrence group and randomly
select an equal number of non-occurrence patients for a 50/50 split in the resulting dataset. Then we use
several predictive models, including logistic regression, neural networks, and decision trees to optimize
results. While the overall model may lack in accuracy, we can use the defined lift to determine which
patients should be treated prophylactically to reduce the occurrence of resistant infection. Lift allows us
to find the patients at highest risk for occurrence, and with the greatest probability of accurate prediction.
This is especially important since these are the patients we would want to take the greatest care for.
Given a lift function, we can decide on a decile cutpoint so that we can predict the high risk patients
above the cutpoint, and predict the low risk patients below a second cutpoint, while failing to make
a definite prediction for those in the center. In that way, we can dismiss those who have no risk, and
aggressively treat those at highest risk. We can then decide somewhere in the center when to stop pro-
phylactic treatment. That cutpoint will depend largely upon the differential in cost of treatment after
infection versus cost of treatment to prevent infection. Lift allows us to distinguish between patients
without assuming a uniformity of risk.
Table 21 gives a total of 32 clusters of patient diagnoses associated with MRSA; Table 22 gives a
corresponding total of 31 clusters of patient procedures. Unlike the previous severity indices where we
restricted the number of clusters, we can expand the number to investigate infection. We use a stratified
sample of all patients with MRSA and a random sample of patients without MRSA to define the text
clusters. Both clusters were identified using text mining.
Not every patient in a cluster will have all of the diagnoses listed in the cluster. These diagnoses are
those that characterize the cluster. That means that the linkage between the diagnoses has a fairly high
probably of occurring.
Search WWH ::




Custom Search