Information Technology Reference
In-Depth Information
Table 23. Scored diagnosis clusters for 2004 data
Cluster Number
Number of MRSA Patients Predicted into
Cluster
Cluster Number
Number of MRSA Patients Predicted into
Cluster
1
98
17
365
2
623
18
64
3
66
19
100
4
191
20
320
5
71
21
94
6
48496
22
134
7
447
23
72
8
1971
24
700
9
86
25
78
10
225
26
469
11
260
27
546
12
145
28
477
13
210
29
2680
14
105
30
27
15
55
31
181
16
316
32
451
Diabetes with other specified manifestations (of unspecified type), Other bone involvement in diseases
classified elsewhere. While these patients are not at the highest level of MRSA, they are at a very high
level. It does suggest that at the least, infection control procedures should be used on this class of patients
to reduce the occurrence of MRSA. Another large number of infections are predicted into cluster 29,
Excessive or frequent menstruation, Other specified rehabilitation procedure, Acute posthemorrhagic
anemia, Organ or tissue replaced by other means, knee joint, Iron deficiency anemias, secondary to blood
loss, Diverticulosis of colon (without mention of hemorrhage). These patients, however, have a much
lower occurrence of MRSA and prophylactic treatment is probably unnecessary.
We next use predictive modeling. First, we need to define a profit/loss matrix to determine the differ-
ence between a positive prediction of MRSA, and a negative prediction. Then, we need to partition the
sampled dataset into training, validation, and testing subsets. We also use several models to determine
the model of best fit.(P. B. Cerrito, 2007) Since we have over-sampled the occurrence of MRSA, we also
need to set the prior probabilities of occurrence and non-occurrence to those that exist in the population.
In this first example, the procedure and diagnosis clusters are defined as interval variables.
The use of a profit/loss matrix will not change the model results; it will change the choice of which
model is optimal. The default for “best” fit is the misclassification rate; this can be modified to finding
the model with the highest profit (or minimal loss). Therefore, by changing the profit/loss matrix, we
can perform a sensitivity analysis to determine when the optimal model choice will change. We use the
available patient demographics as well as the categories of patient diagnoses and procedures. Using
just the misclassification, the optimal model is a regression with a 50% misclassification rate. Since
we used a 50/50 split in the data, 50% misclassification is a poor fit. If the cost of treatment is 10 times
the cost for prophylactic treatment, the best model is a neural network with an overall average cost less
Search WWH ::




Custom Search