Information Technology Reference
In-Depth Information
The presence of such bias does not mean we should abandon our
attempts at modeling, but that we should be aware of and report our
limitations.
Stationarity
An underlying assumption of regression methods is that relationships
among variables remain constant during the data collection period. If not,
if the variables we are measuring undergo seasonal or other detectable
changes, then we need to account for them. A multivariate approach is
called for as described in the next chapter.
Practical Versus Statistical Significance
An association can be of statistical significance without being of the least
practical value. In the study by Kanarek et al. [1980] referenced above, a
100-fold increase in asbestos fiber concentration is associated with perhaps
a 5% increase in lung cancer rates. Do we care? Perhaps, because no life
can be considered unimportant. But courts traditionally have looked for at
least a twofold increase in incidence before awarding damages. (See, for
example, the citations in Chapter 6 of Good, 2001b.) And in this particu-
lar study, there is reason to believe there might be other hidden cofactors
that are at least as important as the presence of asbestos fiber.
Goodness-of-Fit Versus Prediction
As noted above, we have a choice of “fitting methods.” We can minimize
the sum of the squares of the deviations between the observed and model
values, or we can minimize the sum of the absolute values of these devia-
tions, or we can minimize some entirely different function. Suppose that
we have followed the advice given above and have chosen our goodness-
of-fit criterion to be identical with our loss function.
For example, suppose the losses are proportional to the square of the
prediction errors, and we have chosen our model's parameters so as to
minimize the sum of squares of the differences y i - M [ x i ] for the historical
data. Unfortunately, minimizing this sum of squares is no guarantee that
when we continue to make observations, we will continue to minimize the
sum of squares between what we observe and what our model predicts. If
you are a businessman whose objective is to predict market response, this
distinction can be critical.
There are at least three reasons for the possible disparity:
1. The original correlation was spurious.
2. The original correlation was genuine but the sample was not
representative.
3. The original correlation was genuine, but the nature of the rela-
tionship has changed with time (as a result of changes in the
Search WWH ::




Custom Search