Biology Reference
In-Depth Information
know the standard error in the median of the distribution. We can calculate the median of
our measurements of X, which equals 3.0, but can we actually conclude that the median of
the population is greater than 2.0? We do not really know the range of values that the median
might take on for this distribution, and the normal model provides no estimate of the uncer-
tainty in the median. The standard deviation and variance of populations are also of tremen-
dous biological interest, but how do we estimate the range of values for these statistics?
Resampling-Based Methods
Having noted that we can face serious difficulties when we assume a normal distribu-
tion and rely on the theory based on it, we now examine methods that allow us to make
statistical inferences without assuming any distribution.
The Bootstrap
We begin with the bootstrap because it is probably the easiest to understand. It was not
the first computer-based statistical method developed; in fact it is one of the more recent
(it was developed from jackknife and permutation methods). The term “bootstrapping”
comes from the novel Baron Mu¨nchausen's Narrative of his Marvelous Travels and Campaigns
in Russia,by Rudolph Erich Rasp´ (1785) , in which the Baron falls to the bottom of a deep
lake. He cannot figure out what to do until, at the last moment, he thinks to pull himself
up by his own bootstraps. This describes, fairly accurately, the approach used in a boot-
strap procedure: the observed data themselves are used as a basis for resampling. We will
approximate the unknown statistical distribution from which the data were drawn by
(randomly) resampling our data.
A bootstrap set is a set of data of the same sample size as the original data set, whose
elements are randomly drawn with replacement from our original set of observations. To
draw them randomly (with replacement) from a set of N elements, a uniformly distributed
random number from 1 to N is generated by a random number generator. The correspond-
ing element from the original set of observations then forms the first element in the boot-
strap set. For example, given our 31 observations, we will construct a sample that also has
31 observations. The number provided by the random number generator is 8, so we take
the value of the eighth individual of our sample as the first value in the bootstrap set. This
procedure is repeated N times. Note that a single value from the original data set may
appear multiple times in a bootstrap set because we are sampling with replacement, mean-
ing that we do not remove an individual from the sample after we have placed its value
in the bootstrap set. As a result, some values might not appear at all in the bootstrap set.
To see how a bootstrap set is formed, we consider an abstract, symbolic example.
Suppose
C
contains five values:
C 5 f C 1 ;
C 2 ;
C 3 ;
C 4 ;
C 5 g
(8A.5)
, we generate a list of five random numbers, each inde-
pendently chosen and ranging from 1 to 5 (because N
To form a bootstrap version of
C
5):
5
L
5 f
52435
g
(8A.6)
Search WWH ::




Custom Search