Biology Reference
In-Depth Information
X
N
s XY
1 ð X i 2,
X
. Þð Y i 2,
Y
. Þ
(8A.39)
5
i
5
are the values of A and B which minimized the summed square residuals (
ε i ). This sum of
squared error terms is:
X
X
N
N
2
2
Error
5
1 ð Y i
2
A
2
BX i Þ
5
1 ðε
Þ
(8A.40)
i
i
i
5
5
under the assumption that the residuals are independently and identically normally
distributed.
To show that there is a statistically significant dependence of Y on X, it is sufficient to
show that the confidence interval on the slope excludes zero. This is equivalent to showing
that there is a non-zero correlation between Y and X, which may be tested using the
squared value of the correlation coefficient (R 2 ) between X and Y, which indicates the frac-
tion of the variance in the dependent variable (Y) that is explained by the independent
variable (X). The expression for R 2 is:
s XY
s XX s YY
R 2
(8A.41)
5
where
X
N
2
s YY 5
1 ð Y i 2,
Y
. Þ
(8A.42)
i
5
It is very common to interpret high R 2 values as being indicative of high explanatory
power in a regression model. There is a method of testing whether an R 2 value is statisti-
cally significant (under the assumption of normality of the residuals), by the expression:
1
2 ln
1
R
1
(8A.43)
1
2
R
which is a normally distributed variable, with variance equal to 1/(N
3), where N is the
2
sample size.
The significance of the slope can be assessed by a permutation test. The objective is to
determine the range of slopes that could be generated by random permutations of the
associations among X and Y values, since the null hypothesis implies that these values are
exchangeable. Thus, we again adopt the strategy of assuming that the null hypothesis is
true (which, in this case, is that the associations among X and Y values are random). The
associations of the X i values with the Y i are then randomized, generating a permutation
set of paired X and Y values with the same distribution of X and Y values as in the data,
but with randomized combinations of X and Y. The regression model is then fitted to each
permutation set, and the slope (or correlation coefficient) is calculated. The distribution of
the regression slopes (or the correlation coefficients) generated by the permutation sets can
Search WWH ::




Custom Search