Biology Reference
In-Depth Information
It is common to regard high r 2 values as indicating high explanatory power of the
model. However, even high values of r 2 need not be statistically significantly greater than
zero. For that reason we need to test the statistical significance of r 2 , which we can do
(assuming normality of the residuals) using the expression:
2 in ð 1
r Þ
1
1
z
(8.12)
5
ð 1
r Þ
2
which is a normally distributed variable, with variance equal to 1/(N
3), where N is the
sample size (see the derivation in Freund and Walpole, 1980 ), a calculation that assumes
that the residuals are independent and normally distributed. So, based on an analytic
model of the distribution of r values, we can test whether or not the variance explained by
the model is larger than we expected by chance.
The other approach to testing the significance of an observed r value is to use a permu-
tation test of the significance of the regression, an approach which dates back to Fisher
(1935) . The null hypothesis we would like to disprove can be stated as:
2
H 0 : The variance explained by this model for this particular data set is no greater than might occur by
chance, meaning that there is no association between the X and Y values that differs from what we might
expect to occur randomly.
This hypothesis contains a statement about the exchangeability (Anderson, 2001b) of the
X and Y variables in our data set, namely, that the relationship between X and Y is
exchangeable. That is because, if the null hypothesis (
) is true, if we randomly shuffled
the X i and Y i values to create new pairings, permuting the original data, we would expect
the model to fit the permuted data as well as it fits the original data. Because the relation-
ship between X and Y is exchangeable under
H 0
is true, the model should have the
same predictive power for the permuted data as it did for the original data. That allows us
to state a basis for rejecting the null hypothesis: if we form a large number of permuted data
sets, we can determine how many of them have as large an r value as the original data set
did. If only 3% of the permuted data sets have as large an r value as the original data
set does, we can use this observed 3% rate to claim that there is only a 3% chance that the
observed r values could have arisen from a randomly permuted set of data. Permutation
methods are discussed in more detail later (see the Appendix of this chapter and the discus-
sion of permutations in the next chapter). But it is important to note that the permutation
method used here does assume that the residuals are independent of one another, just as
the analytic model did. The permutation assumes that the residuals also came from the
same distribution, but does not require that the distribution be normal, a difference from
the analytic model discussed earlier.
H 0
,if
H 0
MULTIVARIATE REGRESSION
To apply this theory to shape we need to extend it to the multivariate case. Our depen-
dent variable, for the case of two-dimensional data consisting solely of landmarks, is a vec-
tor with 2K
4 components. That number will need to be adjusted for three-dimensional
2
Search WWH ::




Custom Search