Information Technology Reference
In-Depth Information
tions, any patient is equally likely to enter any hospital to be treated. In reality, a patient usually enters
a hospital where their physician has privileges, so the randomization is not complete.
Moreover, we must make the very necessary assumption that any patient's information is entered into
the billing record in exactly the same way regardless of the hospital chosen. Yet this assumption is never
put to the test by taking a group of patient records to multiple hospitals to see if the billing information
extracted is the same. In all likelihood, if such a test were performed, the different billing records would
vary considerably. In fact, it is widely known that different providers code differently.
Generally, regression models are used to define severity indices. However, a major problem with
traditional statistical methods is that they assume relatively small datasets. They generally do not reach
the level of 100,000 or 1 million observations that are typically in the large databases used for outcomes
research. When the datasets are so large, the p-value has little meaning and cannot be used to measure
the effectiveness of a statistical model; other measurements must be used instead. In this chapter, we
will discuss some of the statistical methods and some of the issues with large samples. In addition, we
will discuss the issue of model assumptions, and model validity.
Here, we will discuss some of the required assumptions that are involved when using both linear
and logistic regression. Regression requires an assumption of normality. The definition of confidence
intervals, too, requires normality. However, most healthcare data are exponential or gamma. A gamma
distribution is non-symmetric with a very heavy tail. Most patients can be treated within general time
guidelines; however, there will always be a few patients who need extraordinary care, which is why
health outcomes have skewed, heavy tails in the data distribution. In fact, these extreme patients can
overwhelm the available healthcare dollars.
Background
Use of the Central Limit Theorem assumes that patients can be treated within 2 standard deviations of the
average. However, with a heavy tail, there can be a considerable differential between these two standard
deviations and 5-10 percent of the extreme patients.(Battioui, 2007a) One of the reasons for making the
assumption of normal distributions is that most of our available statistical models require this assumption.
What happens if the assumption of normality is not valid? What happens if the Central Limit Theorem
lacks practical meaning? We must be able to work with data in the absence of these assumptions.
According to the Central Limit theorem, the sample mean can be assumed normal if the sample is
sufficiently large. If the distribution is exponential, just how large is large enough? We need to examine
the requirements of the Central Limit Theorem to explore the concept of large. Also, we want to examine
patient-level data rather than group-level data. That will mean that we will want to include information
about patient condition in any regression model. A modification of the standard regression, called the
generalized linear model can assume population distributions that are not normal. This model can be
used to examine gamma and exponential distributions that are often found in medical data.
Additional assumptions for regression are that the mean of the error term is equal to zero, and that
the error term has equal variance for different levels of the input or independent variables. While the
assumption of zero mean is almost always satisfied, the assumption of equal variance is not. Often, as
the independent variables increase in value, the variance increases as well. Therefore, modifications
are made to the variables, usually in the form of transformations, substituting the log of an independent
Search WWH ::




Custom Search