Statistical Methods - Text Mining Techniques for Healthcare Provider Quality Determination

Information Technology Reference

In-Depth Information

tions, any patient is equally likely to enter any hospital to be treated. In reality, a patient usually enters

a hospital where their physician has privileges, so the randomization is not complete.

Moreover, we must make the very necessary assumption that any patient's information is entered into

the billing record in exactly the same way regardless of the hospital chosen. Yet this assumption is never

put to the test by taking a group of patient records to multiple hospitals to see if the billing information

extracted is the same. In all likelihood, if such a test were performed, the different billing records would

vary considerably. In fact, it is widely known that different providers code differently.

Generally, regression models are used to define severity indices. However, a major problem with

traditional statistical methods is that they assume relatively small datasets. They generally do not reach

the level of 100,000 or 1 million observations that are typically in the large databases used for outcomes

research. When the datasets are so large, the p-value has little meaning and cannot be used to measure

the effectiveness of a statistical model; other measurements must be used instead. In this chapter, we

will discuss some of the statistical methods and some of the issues with large samples. In addition, we

will discuss the issue of model assumptions, and model validity.

Here, we will discuss some of the required assumptions that are involved when using both linear

and logistic regression. Regression requires an assumption of normality. The definition of confidence

intervals, too, requires normality. However, most healthcare data are exponential or gamma. A gamma

distribution is non-symmetric with a very heavy tail. Most patients can be treated within general time

guidelines; however, there will always be a few patients who need extraordinary care, which is why

health outcomes have skewed, heavy tails in the data distribution. In fact, these extreme patients can

overwhelm the available healthcare dollars.

Background

Use of the Central Limit Theorem assumes that patients can be treated within 2 standard deviations of the

average. However, with a heavy tail, there can be a considerable differential between these two standard

deviations and 5-10 percent of the extreme patients.(Battioui, 2007a) One of the reasons for making the

assumption of normal distributions is that most of our available statistical models require this assumption.

What happens if the assumption of normality is not valid? What happens if the Central Limit Theorem

lacks practical meaning? We must be able to work with data in the absence of these assumptions.

According to the Central Limit theorem, the sample mean can be assumed normal if the sample is

sufficiently large. If the distribution is exponential, just how large is large enough? We need to examine

the requirements of the Central Limit Theorem to explore the concept of large. Also, we want to examine

patient-level data rather than group-level data. That will mean that we will want to include information

about patient condition in any regression model. A modification of the standard regression, called the

generalized linear model can assume population distributions that are not normal. This model can be

used to examine gamma and exponential distributions that are often found in medical data.

Additional assumptions for regression are that the mean of the error term is equal to zero, and that

the error term has equal variance for different levels of the input or independent variables. While the

assumption of zero mean is almost always satisfied, the assumption of equal variance is not. Often, as

the independent variables increase in value, the variance increases as well. Therefore, modifications

are made to the variables, usually in the form of transformations, substituting the log of an independent

Search WWH ::

Custom Search

Home