Univariate Regression - Common Errors in Statistics

Information Technology Reference

In-Depth Information

Statistically significant findings should serve as a motivation for further cor-

roborative and collateral research rather than as a basis for conclusions.

Checklist: Write down and confirm your assumptions before you begin.

•

Data cover an adequate range. Slope of line not dependent on a

few isolated values.

•

Model is plausible and has or suggests a causal basis.

•

Relationships among variables remained unchanged during the

data collection period and will remain unchanged in the near

future.

•

Uncontrolled variables are accounted for.

•

Loss function is known and will be used to determine the good-

ness of fit criteria.

•

Observations are independent, or the form of the dependence is

known or is a focus of the investigation.

•

Regression method is appropriate for the types of data involved

and the nature of the relationship.

•

Is the distribution of residual errors known?

TO LEARN MORE

David Freedman's [1999] article on association and causation is must

reading. Lieberson [1985] has many examples of spurious association.

Friedman, Furberg and DeMets [1996] cite a number of examples of clin-

ical trials using misleading surrogate variables.

Mosteller and Tukey [1977] expand on many of the points raised here

concerning the limitations of linear regression. Mielke and Berry [2001,

Section 5.4] provide a comparison of MRPP, Cade-Richards, and OLS

regression methods. Distribution-free methods for comparing regression

lines among strata are described by Good [2001, pp. 168-169].

For more on Simpson's paradox, see

http://www.cawtech.freeserve.co.uk/simpsons.2.html. For a real-world

example, search under Simpson's paradox for an analysis of racial bias in

New Zealand Jury Service at http://www.stats.govt.nz.

Search WWH ::

Custom Search

Home