Information Technology Reference
In-Depth Information
The assumption of randomness is still not satisfied, indicating that the transformation actually makes
the situation worse as shown by Figure 12.
We investigate the nature of these residuals for the patients with COPD. Recall that the residuals need
to be independent and identically distributed normal variables with mean zero and constant variance.
If we regress the total charges on the length of stay, the r 2 value is approximately 47%. We want to use
bivariate kernel density estimation to examine the basic assumptions. Figure 13 gives the density of total
charges and length of stay. To find this bivariate density, we use the following code:
proc kde data=sasuser.charlsonsmallersample;
bivar los (gridl= 0 gridu= 30 ) totchg (gridl= 0 gridu= 50000 )/ bwm= 10 out=sasuser.kdebivar4
plots=all;
run;
This density is not normally distributed because it is skewed in both directions for length of stay
and total charges. It is where the distribution is skewed that the graph of the residuals in Figures 10, 11
becomes very scattered and unpredictable. We therefore also examine the linear regression for truncated
values (length of stay of 0 to 5 days; total charges of 0 to 20,000). Since it is skewed in both directions,
it will be very difficult to define a regression that satisfies all of the required assumptions.
Figure 14 gives a cross section of the curve in Figure 13. It gives the total charges for a stay of 1,
5, and 10 days. For 1 day, the peak occurs at $5000; it increases to $12,000 for 5 days. At 10 days, the
variable is so large that a peak cannot be discerned.
Similarly, Figure 15 gives a cross section for $5000, $10,000, and $15,000. In contrast to Figure 16,
the peaks all occur at the same value of 2.5 days. It shows that the cost can be highly variable without
knowing something about a patient's condition or the procedures performed.
Figure 13. Bivariate density function of total charges and length of stay
Search WWH ::




Custom Search