Environmental Engineering Reference
In-Depth Information
mt
+
⋅ ≤≤+
sn
05
.
µ
mt
⋅
sn
05
.
(1.21)
0 025
.
0 975
.
where
t
0.025
and
t
0.975
are, respectively, the 0.025 and 0.975 percentiles of the Student's
t
- distribution with
n
− 1 DOF.
For the sample standard deviation,
s
, if Y is indeed normally distributed (again, which may
not be true), the standardized (
n
− 1)
s
2
/σ
2
is distributed as the χ-squared distribution with
(
n
− 1) DOF. An empirical example of this χ-squared distribution with 9 DOFs is shown in
(1.22)
(
n
−⋅ ≤≤−⋅
1
)
s
2
χ
2
σ
2
(
n
1
)
s
2
χ
2
0 975
.
0 025
.
where χ
0 025
.
are, respectively, the 0.025 and 0.975 percentiles of the χ-squared
distribution with (n − 1) DOF.
2
and χ
0 975
2
.
1.2.3.5.2 Bootstrapping
Equations 1.20
through
1.22
are based on the strong assumption that the data are nor-
mally distributed. In practice, we do not know the distribution of the data. We can test for
normality using the K-S test described below, but it would be convenient to obtain confi-
dence intervals for μ and σ without making an assumption on the distribution. The non-
parametric bootstrapping (Efron and Tibshirani 1993) is a general framework of obtaining
approximate samples from the sampling distribution of any statistics. Let the statistics
of interest be denoted by g(Y
(1)
, Y
(2)
, …, Y
(
n
)
). For the sample mean, m, g(Y
(1)
, Y
(2)
, …,
Y
(
n
)
) = (Y
(1)
+ Y
(2)
+ … + Y
(
n
)
)/
n
. The steps for bootstrapping are as follows:
1. Resampling (Y
(1)
, Y
(2)
, …, Y
(
n
)
) with replacement. Denote the resampled Y by (Y′
(1)
, Y′
(2)
,
…, Y′
(
n
)
). It is noteworthy that after the resampling, there may be repetitive values in
(Y′
(1)
, Y′
(2)
, …, Y′
(
n
)
), because they are resampled with replacement.
2. Evaluate g(Y′
(1)
, Y′
(2)
, …, Y′
(
n
)
). This is a resampled g value.
3. Repeat steps 1 and 2 to obtain B resampled g value. Note that B is distinctive from
n
.
The B resampled g values can be viewed as approximate realizations of the sampling dis-
tribution of g.
Again, we initialized by randn('state', 13) before executing normrnd(100, 20, 10, 1). The
sample mean,
m
= 101.84 and the sample standard deviation,
s
= 23.82. These are the point
estimates for μ and σ. However, it is not clear how large the statistical uncertainties are.
Figure
1.10
shows the histograms of B = 1000 resampled
m
values and
s
values (B = 1000) based on
the bootstrapping procedure. The 95% confidence intervals of μ and σ can be estimated as
the interval bounded by the 0.025 and 0.975 sample percentiles of the resampled values. This
confidence interval is called the 95% bootstrap confidence intervals, and this method is called
the percentile method (Efron 1981). For μ, the 95% bootstrap confidence interval is [88.37,
115.58]. This can be compared to the 95% analytical confidence interval [84.80, 118.88]
based on
Equation 1.21
. For σ, the 95% bootstrap confidence interval is [15.51, 27.90]. This
can be compared to the 95% analytical confidence interval [16.38, 43.49] based on
Equation
small sample
n
= 10. The problem of “insufficient coverage” for bootstrap confidence intervals
of σ was discussed in Schenker (1985): the probability for the bootstrap confidence interval to
cover the actual value of σ is lower than expected. This problem may occur when the sample
size (n) is small. The bootstrap method is based on the assumption that the discrete samples
Search WWH ::
Custom Search