Information Technology Reference
In-Depth Information
This decision tree shows that it is possible to get a model even while mis-identifying variables. The
diagnosis and procedure clusters are nominal data, and should more properly be identified as nominal.
Once the inequalities are noted in the decision tree, it should be corrected and redone.
However, if this model is used, almost 95% of the patients are treated prophylactically. That is not a
reasonable result, so we look to find the patients at highest risk. For this, we use the lift function, shown
in Figure 32.
Figure 32 indicates that the lift is two or higher for the first decile of patients. The highest lift in this
decile comes from a regression model. It shows that approximately 10% of the patients should very
definitely be treated prophylactically.
While the National Inpatient Sample does not include prescription information, there are two proce-
dures related to antibiotics: 99.21 (infusion of antibiotic) and 00.14 (infusion of linezolid). Surprisingly,
only 58 of the MRSA patients had either of these procedures listed in any of the 15 columns of procedure
codes. This was an unexpected finding in the data. Either the patients are not treated with antibiotics,
or the use of antibiotics is substantially under-reported in the hospital environment. Data mining can
help you to investigate results that you do not anticipate. The use of antibiotics for resistant infection
requires additional study.
One of the confounding factors with the occurrence of resistant infection is the practice of infection
control in hospitals-and the variability in the adherence to infection control procedures. Even so, once
different groups of procedures can be used to predict the occurrence of resistant infection, steps can be
taken to reduce that occurrence through the use of prevention in the form of prophylactic antibiotics, or
in increased adherence to infection control. Treatment procedures related to dialysis, for example, have
a much higher risk of infection, and this knowledge can be used to reduce the problem.
assuming that the clusters are nominal data
More properly, the clusters are nominal since there is no real ordering in the variables. We could renum-
ber the clusters by the risk of MRSA to make them ordinal (but not interval). The predictive model then
changes to reflect the change in the clusters. We use the variable definitions as shown in Figure 33. We
again use a stratified sample with a 50/50 split.
The modified ROC Curve is given in Figure 34. It appears to have the same accuracy level as that
in Figure 31.
The corresponding decision tree is given in Figure 35. Note that the clusters are separated individually
in the tree in contrast to the tree shown in Figure 30. It is also a simpler tree compared to that in Figure
30. The first split is on clusters 7 and 5, which have a very high incidence of MRSA. The next split is
on diagnosis clusters 3 and 2, which have much lower levels of MRSA. Length of stay also contributes,
with a stay of longer than 3 days having a higher risk of MRSA.
Figure 32. Lift Function for MRSA prediction
Search WWH ::




Custom Search