Information Technology Reference
In-Depth Information
Figure 16. Length of stay for charges of $20,000, $25,000, and $30,000
example using one Procedure of cardiovascular surgery 2
In this example, we examine all patients undergoing cardiovascular bypass surgery; this dataset was
previously discussed in Chapter 2. We also use a set of 17 patient diagnoses as independent variables.
We want to see if we can predict length of stay and total charges, and we want to see if the hospital
makes a difference, so we restrict our attention to patients in a total of ten different hospitals, a total of
1400 patients. The list of independent variables is given in Table 11.
The model is highly statistically significant, with an r 2 value of 0.13, again indicating that most of
the variabilitly in length of stay is still unaccounted for, even if we restrict the patient procedures and
conditions, and limit the data to just ten hospitals. However, when the outcome variable is total charges,
the r 2 value is 0.51, indicating that the majority of variability in charges is accounted for by the inde-
pendent variables. One of the reasons for that is that hospitals tend to set their own charges. If we omit
hospital from the independent list, the r 2 value falls to 0.074. Therefore, charges is not determined by
patient condition but by hospital. It again clearly demonstrates that there is no uniformity in assessing
charges across hospitals. Figure 17 gives the residual graph with total charges as the dependent variable
and including hospital as a dependent variable. It clearly shows two different groups in the data.
outliers in regression 2
An assumption of normality is required for regression. If we assume normality when the distribution is
exponential or gamma, the outliers will be under-counted. Consider the dataset here that is not normally
distributed. We use a random sample of 1000 observations. The mean and standard deviation (assuming
a normal distribution) are equal to 4.235 and 5.0479 respectively. Then, three standard deviations beyond
the mean is equal to 19.3787 days. Two standard deviations beyond the mean is equal to 14.3308. In the
random sample, the proportion of patients with length of stay in days beyond two standard deviations is
equal to 35% when the normal probability indicates that only 25% should be that large. The proportion
Search WWH ::




Custom Search