Information Technology Reference
In-Depth Information
Statistically significant findings should serve as a motivation for further cor-
roborative and collateral research rather than as a basis for conclusions.
Checklist: Write down and confirm your assumptions before you begin.
•
Data cover an adequate range. Slope of line not dependent on a
few isolated values.
•
Model is plausible and has or suggests a causal basis.
•
Relationships among variables remained unchanged during the
data collection period and will remain unchanged in the near
future.
•
Uncontrolled variables are accounted for.
•
Loss function is known and will be used to determine the good-
ness of fit criteria.
•
Observations are independent, or the form of the dependence is
known or is a focus of the investigation.
•
Regression method is appropriate for the types of data involved
and the nature of the relationship.
•
Is the distribution of residual errors known?
TO LEARN MORE
David Freedman's [1999] article on association and causation is must
reading. Lieberson [1985] has many examples of spurious association.
Friedman, Furberg and DeMets [1996] cite a number of examples of clin-
ical trials using misleading surrogate variables.
Mosteller and Tukey [1977] expand on many of the points raised here
concerning the limitations of linear regression. Mielke and Berry [2001,
Section 5.4] provide a comparison of MRPP, Cade-Richards, and OLS
regression methods. Distribution-free methods for comparing regression
lines among strata are described by Good [2001, pp. 168-169].
For more on Simpson's paradox, see
http://www.cawtech.freeserve.co.uk/simpsons.2.html. For a real-world
example, search under Simpson's paradox for an analysis of racial bias in
New Zealand Jury Service at http://www.stats.govt.nz.