Information Technology Reference
In-Depth Information
Statistically significant findings should serve as a motivation for further cor-
roborative and collateral research rather than as a basis for conclusions.
Checklist: Write down and confirm your assumptions before you begin.
Data cover an adequate range. Slope of line not dependent on a
few isolated values.
Model is plausible and has or suggests a causal basis.
Relationships among variables remained unchanged during the
data collection period and will remain unchanged in the near
future.
Uncontrolled variables are accounted for.
Loss function is known and will be used to determine the good-
ness of fit criteria.
Observations are independent, or the form of the dependence is
known or is a focus of the investigation.
Regression method is appropriate for the types of data involved
and the nature of the relationship.
Is the distribution of residual errors known?
TO LEARN MORE
David Freedman's [1999] article on association and causation is must
reading. Lieberson [1985] has many examples of spurious association.
Friedman, Furberg and DeMets [1996] cite a number of examples of clin-
ical trials using misleading surrogate variables.
Mosteller and Tukey [1977] expand on many of the points raised here
concerning the limitations of linear regression. Mielke and Berry [2001,
Section 5.4] provide a comparison of MRPP, Cade-Richards, and OLS
regression methods. Distribution-free methods for comparing regression
lines among strata are described by Good [2001, pp. 168-169].
For more on Simpson's paradox, see
http://www.cawtech.freeserve.co.uk/simpsons.2.html. For a real-world
example, search under Simpson's paradox for an analysis of racial bias in
New Zealand Jury Service at http://www.stats.govt.nz.
Search WWH ::




Custom Search