Information Technology Reference
In-Depth Information
“A second sort of bias comes into play because some galaxies are too
faint or small to be in the sample; in effect, the large-distance tail of P ( d | r )
is cut off. It follows that the typical inferred distances are smaller than
those expected at a given true distance r . As a result, the peculiar velocity
model that allows true distance to be estimated as a function of redshift is
tricked into returning shorter distances. This bias goes in the same sense
as Malmquist bias, but is fundamentally different.” It results not from
volume/density effects, but from the same sort of sample selection effects
that were discussed earlier in this section.
Selection bias can be minimized by working in the “inverse direction.”
Rather than trying to predict absolute magnitude ( Y ) given a value of the
velocity width parameter ( X ), instead one fits a line by regressing the
widths X on the magnitudes Y .
Finally, bias can result from grouping or averaging data. Bias if group
randomized trials are analyzed without correcting for cluster effects was
reported by Feng et al. [1996]; see Chapter 5. The use of averaged rather
than end-of-period data in financial research results in biased estimates of
the variance, covariance, and autocorrelation of the first- as well as higher-
order changes. Such biases can be both time varying and persistent
(Wilson, Jones, and Lundstrum, 2001).
REPORTING POWER
Statisticians are routinely forced to guess at the values of population
parameters in order to make the power calculations needed to determine
sample size. Once the data are in hand, it's tempting to redo these same
power calculations. Don't. Post hoc calculations invariably inflate the
actual power of the test (Zumbo and Hubley, 1998).
Post hoc power calculations can be of value in designing follow-up
studies, but should not be used in reports.
DRAWING CONCLUSIONS
Found data (nonrandom samples) can be very useful in suggesting models
and hypotheses for further exploration. But without a randomized study,
formal inferential statistical analyses are not supported (Greenland, 1990;
Rothman, 1990b). The concepts of significance level, power, p value, and
confidence interval apply only to data that have arisen from carefully
designed and executed experiments and surveys.
A vast literature has grown up around the unease researchers feel in
placing too much reliance on p values. Examples include Selvin [1957],
Yoccuz [1991], Badrick and Flatman [1999], Feinstein [1998], Johnson
Search WWH ::




Custom Search