Statistics - Geometric Morphometrics for Biologists

Biology Reference

In-Depth Information

know the standard error in the median of the distribution. We can calculate the median of

our measurements of X, which equals 3.0, but can we actually conclude that the median of

the population is greater than 2.0? We do not really know the range of values that the median

might take on for this distribution, and the normal model provides no estimate of the uncer-

tainty in the median. The standard deviation and variance of populations are also of tremen-

dous biological interest, but how do we estimate the range of values for these statistics?

Resampling-Based Methods

Having noted that we can face serious difficulties when we assume a normal distribu-

tion and rely on the theory based on it, we now examine methods that allow us to make

statistical inferences without assuming any distribution.

The Bootstrap

We begin with the bootstrap because it is probably the easiest to understand. It was not

the first computer-based statistical method developed; in fact it is one of the more recent

(it was developed from jackknife and permutation methods). The term “bootstrapping”

comes from the novel Baron Mu¨nchausen's Narrative of his Marvelous Travels and Campaigns

in Russia,by Rudolph Erich Rasp´ (1785) , in which the Baron falls to the bottom of a deep

lake. He cannot figure out what to do until, at the last moment, he thinks to pull himself

up by his own bootstraps. This describes, fairly accurately, the approach used in a boot-

strap procedure: the observed data themselves are used as a basis for resampling. We will

approximate the unknown statistical distribution from which the data were drawn by

(randomly) resampling our data.

A bootstrap set is a set of data of the same sample size as the original data set, whose

elements are randomly drawn with replacement from our original set of observations. To

draw them randomly (with replacement) from a set of N elements, a uniformly distributed

random number from 1 to N is generated by a random number generator. The correspond-

ing element from the original set of observations then forms the first element in the boot-

strap set. For example, given our 31 observations, we will construct a sample that also has

31 observations. The number provided by the random number generator is 8, so we take

the value of the eighth individual of our sample as the first value in the bootstrap set. This

procedure is repeated N times. Note that a single value from the original data set may

appear multiple times in a bootstrap set because we are sampling with replacement, mean-

ing that we do not remove an individual from the sample after we have placed its value

in the bootstrap set. As a result, some values might not appear at all in the bootstrap set.

To see how a bootstrap set is formed, we consider an abstract, symbolic example.

Suppose

C

contains five values:

C 5 f C 1 ;

C 2 ;

C 3 ;

C 4 ;

C 5 g

(8A.5)

, we generate a list of five random numbers, each inde-

pendently chosen and ranging from 1 to 5 (because N

To form a bootstrap version of

C

5):

5

L

5 f

52435

g

(8A.6)

Geometric Morphometrics for Biologists

Search WWH ::

Custom Search

Home