Biology Reference
In-Depth Information
be used to determine if the observed regression slope (or correlation coefficient) could
have been produced by a random association among X and Y variables. If the observed
slope (or correlation coefficient) is outside the 95% confidence interval of the permutation
sets, then we can reject the null hypothesis that the slope (or correlation coefficient) does
not differ from zero. Note that the permutation test estimates the range of slopes (or corre-
lation coefficients) produced by the null model, not by the observed data. Thus we reject the
null hypothesis by showing that the observed statistic lies outside the range of the values
predicted by the null model.
To carry out a bootstrap test of the significance of the regression line, two approaches are
available: one is to bootstrap the paired observations (X i , Y i ); the other is to bootstrap the resi-
duals from the regression. Note that we could also use permutation methods, on either the
raw data or the residuals. When bootstrapping specimens, we form bootstrap sets by sam-
pling (with replacement) from the paired specimen values (X i , Y i ) to form a bootstrap set.
The regression model is fitted and the slope (or correlation coefficient) is determined for each
bootstrap set, forming a bootstrap estimate of the confidence intervals for the slope (or corre-
lation coefficient). This yields a confidence interval on the slope itself, so that if it excludes
zero, we can reject a null hypothesis that the regression slope (or correlation) is zero.
The alternative is to bootstrap the residuals, by first determining the residuals to the
bootstrap, and the Y values that are predicted by the regression model for each X value:
Y predicted
A
BX
(8A.44)
5
1
Then the residuals are randomly combined with the paired X i and Y predicted values, both of
which are resampled (with replacement). This approach produces a wider variety of possi-
ble paired values of X i and Y i ; it can be thought of as bootstrapping the variable part of
the distribution, independently of the portion that is dependent on X. The range of slopes
(or correlation coefficients) is determined over many bootstrap sets; if the 95% confidence
interval for the slope (or correlation coefficient) excludes zero, we can infer that there is a
statistically significant dependence of Y on X at a 5% confidence level.
The discussion of how a permutation test is used to determine the statistical significance
of a regression slope serves as a useful illustration of the differences in approach between
bootstrap and permutation methods. In the permutation method, the approach is to esti-
mate the confidence interval under the null model, given the distribution of observed data.
Thus, if the observed statistic is outside the confidence interval of the null, the observed
statistic is judged to be significant. In contrast, the bootstrap approach estimates the range
of the statistic on the observed data (rather than the range under the null). Permutation tests
almost always focus on estimating distributions under the assumption that the null model
is true, whereas bootstrap methods can be used to estimate the distribution of a statistic
either over the observed data or under an assumption that the null is true.
Issues Common to All Resampling Methods
Statistical Power
When evaluating the utility of statistical tests we tend to focus on the rate of Type II
errors (i.e. failing to reject the null hypothesis when it is false and the alternative is true).
Search WWH ::




Custom Search