Biology Reference
In-Depth Information
The numbers in L are the ordinal positions of the elements of
C
so
C Bootstrap contains
C
C
the corresponding values of
(e.g. L 1 5
5, so it corresponds to the fifth element of
,
which is C 5 ). Thus:
C Bootstrap 5 f C 5 ;
C 2 ;
C 4 ;
C 3 ;
C 5 g
(8A.7)
Note that C 5 appears twice in this bootstrap set whereas C 1 does not appear even once.
Returning to the numerical example presented earlier:
X 5 f 2
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
2
3
4
2
5
3
2
6
2
3
4
6
2
1
4
3
7
2
3
4
4
5
;
8
;
5
;
2
;
1
;
3
;
4
;
4
;
3 g
(8A.8)
To form a bootstrap set,
X Boot
, from
X
, we generate the list,
B
, of 31 random numbers:
B 5 f
30
;
8
;
19
;
16
;
28
;
24
;
15
;
1
;
26
;
14
;
20
;
25
;
29
;
23
;
6
;
13
;
29
;
13
;
28
;
2
;
11
;
26
;
1
;
5
;
7
;
7
;
19
;
9
;
7
;
1
g
(8A.9)
We then select the elements of
X
corresponding to those ordinal values:
X Boot 5 f 4
;
2
;
2
;
4
;
3
;
8
;
1
;
2
;
2
;
2
;
3
;
5
;
4
;
5
;
5
;
6
;
4
;
4
;
6
;
3
;
2
;
3
;
2
;
2
;
2
;
3
;
3
;
2
;
6
;
3
;
2 g
(8A.10)
We
can now calculate
the mean,
standard deviation and median of
X Boot :
, X Boot .5
3.39,
σχ
Boot ,
1.62, and median(
X Boot )
3. These values are slightly different
5
5
from those of the original distribution,
3.0. To
arrive at an estimate of the confidence intervals for these statistics, we will compute a large
number (N Bootstrap ) of bootstrap sets. We will then determine the 95% confidence interval
over the N Bootstrap sets, forming a bootstrap estimate of the confidence intervals on the mean,
standard deviation and the median. If we generate 200 bootstrap sets based on
X
3.52;
σ 5
1.69, and median (
X
)
,
.5
5
, we find
that the 95% confidence interval for the mean is 3.00 to 4.10; for the standard deviation the
confidence interval is 1.23 to 2.10, and for the median it is 3.00 to 4.00. The normal model
predicted a 95% confidence interval for the mean, 2.91 to 4.12, so the two methods approxi-
mately agree. They appear to differ at the lower boundary (at small lengths), which is where
we expect departures from the normal distribution, for the reasons discussed earlier.
The approach outlined here may be extended to virtually any statistic and to any func-
tion, univariate or multivariate. For example, we can use it to perform t-tests, which are
used to compare the means of two samples. It is possible that the difference in numerical
values of two means is due solely to an arbitrary division of one group into two. Because
of the variation within the population, drawing two samples from it can result in two sam-
ples that differ numerically in their means.
Let us look again at our sample of 31 measured lengths:
X
X
5 f 2
;
2
;
3
;
4
;
2
;
5
;
3
;
2
;
6
;
2
;
3
;
4
;
6
;
2
;
1
;
4
;
3
;
7
;
2
;
3
;
4
;
4
;
5
;
8
;
5
;
2
;
1
;
3
;
4
;
4
;
3 g
(8A.11)
and consider a second group of 18 lengths:
Y
5 f 2
;
2
;
3
;
2
;
4
;
2
;
3
;
2
;
8
;
9
;
2
;
9
;
3
;
2
;
3
;
3
;
3
;
9 g
(8A.12)
Search WWH ::




Custom Search