Biology Reference
In-Depth Information
where Y i is the dependent variable measured for the ith specimen, m is the slope of the
line, b is the Y-intercept of the line, and
ε i is “error” (the variation in Y not explained by
X). Our objective is to estimate m and b and then to determine whether they are statisti-
cally different from zero. This is the model for any hypothesis in which the predictor is a
continuous variable. Size exemplifies a continuous variable because there is always a size
between any two others. In the case of categorical factors, such as the other factor that we
will consider (sex), we cannot find values between any two of them. There are no values
between “male” and “female”.
When the assumption of linearity holds, our statistical analysis can tell us if Y is only
weakly dependent on X meaning that knowledge about X does not enable us to predict
Y. It is also possible that the relationship of the two variables is statistically significant, but
that m is such a small number that the effect of X on Y is biologically trivial. It may be a sta-
tistically significant relationship, in that it is stronger than expected by chance, but it might
not be biologically significant. Recognizing this distinction is important, because statistical
significance is a matter of sample size and the power of a test. With very large samples, or
very powerful tests, we might have little difficulty rejecting the null hypothesis. However,
if X accounts for very little of the variation in Y, X provides little biological insight into Y.
We therefore need to pay as much attention to the explanatory power of X and to the mag-
nitude of its impact on Y as to the statistical results. The fraction of the variance in Y
explained by X (and the model) provides the needed information about explanatory power.
As mentioned above, when the assumption of linearity holds, our statistical analysis can tell
us whether we can predict Y from X. The reason for emphasizing this assumption is that a
strong but non-linear relationship might look like a weak linear one. Consequently, we
might end up rejecting our biological model because the statistical analysis suggests a weak
relationship between variables, but the relation is actually strong but not linear.
Fortunately, in some cases of a non-linear relationship between the variables, it is easy to
transform the independent variable to make the relationship linear. For example, a number
of studies of ontogenetic allometry use the logarithm of centroid size, rather than centroid
size itself, as the independent variable. That transformation is useful when most of the
shape change occurs over small values of X, such as when most shape change occurs early
in ontogeny (as it often does). We should note that it does not matter whether the logarithm
is taken to base 10 (log) or base e (ln) because these differ only by a constant, i.e. log(X)
5
log(e) ln(X)
0.4329 ln(X). In other cases, other transformations of X (such as other trigono-
metric functions) might do a better job of linearizing the relationship between variables.
In a moment, we will present the equations that provide the best estimates of m and b,
but to explain why they are considered “best” we first need to consider how that decision
could be made, in general. The standard approach for deriving the best estimator is to
choose an error function. By minimizing that error, we find the optimal values for the para-
meters. A least squares analysis, as the term suggests, uses the sum of squared residuals
as the error function, so that is the function minimized. We then express the relationship
between that error term and the regression model:
X
5
X
N
N
2
i
2
1 ε
1 ð y i
b Þ
5
2
mx i
2
(8.2)
i
i
5
5
Search WWH ::




Custom Search