Biology Reference
In-Depth Information
where X i is the ith element in the list. In our example,
3.52. Of course, we also
need to quantify our uncertainty in this value. If we assume that the distribution of X fits
the model of a normal distribution, then the standard error of the mean is given by the
standard deviation
,
X
.5
divided by the square root N (the number of observed individuals).
The standard deviation is:
σ
! 1 = 2
P i 5 1 ð X 0
2
. Þ
2,
X
σ 5
(8A.3)
N
1
2
so the standard error of the mean (SEM) is:
! 1 = 2
P i 5 1 ð X 0
2
σ
X
. Þ
2,
p 5
SEM
5
(8A.4)
N
ð
N
1
Þ
2
1.69/(31) 1/2
For our example,
0.304.
The 95% confidence interval for the mean, assuming a normal distribution, ranges from
σ 5
1.69 and N
31, so SEM
5
5
5
1.96(SEM) because, for a normal or Gaussian distribution,
95% of the values in the distribution lie within 1.96 standard deviations of the mean. So,
for our example, 1.96 SEM
X
1.96(SEM) to
X
,
.2
,
.1
3.52, so the 95% confidence interval is
from 2.92 to 4.12. Suppose that we want to claim that the average body length of this pop-
ulation is greater than 3.0 cm. Again, using the normal distribution, we can calculate that
the chance of the mean being less than or equal to 3.0 is 0.049%, so we can reject the
hypothesis that the mean is less than or equal to 3.0 at a 5% confidence level, meaning that
we accept a 5% chance of rejecting the null model when it was true (Type I error).
What difficulties arise in this example? First, we have assumed that the distribution is
normal. This is important even though statistics based on the normal distribution are
known to be robust to violations of the assumptions of normality. Nevertheless, as the dis-
tribution departs further from normality, larger errors appear in the results, leading to
increased error rates. The validity of the normal distribution for our example has not been
determined. Is that assumption reasonable? If the distribution is normal, 1.9% of the mea-
surements will be less than or equal to zero (that is the expectation under the model).
Does that pose a problem? Yes, because we are measuring lengths, and none can be less
than zero, under any circumstances in fact, the lower bound may be substantially larger
than zero (due to physiological constraints on the size of the organism). So we know that
our distribution must deviate from the normal distribution, at least with respect to the
expectation that the mean will be zero. Perhaps that deviation has only a small effect on
our estimate of SEM, but we are relying on the reputation of the normal distribution as a
robust estimator to reassure ourselves about that. We really do not know what effect that
lower bound has on our statistical inferences. We could of course transform our values,
subtracting the mean from all of them, for example. And there are other distribution mod-
els besides the normal, or we could use other transformations (taking the natural log for
example), in an effort to arrive at normally distributed variables.
The other difficulty we face is the lack of an exact formula for the standard error of many
statistics, or of functions of statistics that we might want to work with. Suppose we want to
5
0.304 and
,
X
.5
Search WWH ::




Custom Search