Environmental Engineering Reference
In-Depth Information
mt
+
⋅ ≤≤+
sn
05
.
µ
mt
sn
05
.
(1.21)
0 025
.
0 975
.
where t 0.025 and t 0.975 are, respectively, the 0.025 and 0.975 percentiles of the Student's
t - distribution with n − 1 DOF.
For the sample standard deviation, s , if Y is indeed normally distributed (again, which may
not be true), the standardized ( n − 1) s 2 2 is distributed as the χ-squared distribution with
( n − 1) DOF. An empirical example of this χ-squared distribution with 9 DOFs is shown in
Figure 1.8b . One can then establish the 95% confidence interval of σ 2 by
(1.22)
(
n
−⋅ ≤≤−⋅
1
)
s
2
χ
2
σ
2
(
n
1
)
s
2
χ
2
0 975
.
0 025
.
where χ 0 025
. are, respectively, the 0.025 and 0.975 percentiles of the χ-squared
distribution with (n − 1) DOF.
2
and χ 0 975
2
.
1.2.3.5.2 Bootstrapping
Equations 1.20 through 1.22 are based on the strong assumption that the data are nor-
mally distributed. In practice, we do not know the distribution of the data. We can test for
normality using the K-S test described below, but it would be convenient to obtain confi-
dence intervals for μ and σ without making an assumption on the distribution. The non-
parametric bootstrapping (Efron and Tibshirani 1993) is a general framework of obtaining
approximate samples from the sampling distribution of any statistics. Let the statistics
of interest be denoted by g(Y (1) , Y (2) , …, Y ( n ) ). For the sample mean, m, g(Y (1) , Y (2) , …,
Y ( n ) ) = (Y (1) + Y (2) + … + Y ( n ) )/ n . The steps for bootstrapping are as follows:
1. Resampling (Y (1) , Y (2) , …, Y ( n ) ) with replacement. Denote the resampled Y by (Y′ (1) , Y′ (2) ,
…, Y′ ( n ) ). It is noteworthy that after the resampling, there may be repetitive values in
(Y′ (1) , Y′ (2) , …, Y′ ( n ) ), because they are resampled with replacement.
2. Evaluate g(Y′ (1) , Y′ (2) , …, Y′ ( n ) ). This is a resampled g value.
3. Repeat steps 1 and 2 to obtain B resampled g value. Note that B is distinctive from n .
The B resampled g values can be viewed as approximate realizations of the sampling dis-
tribution of g.
Again, we initialized by randn('state', 13) before executing normrnd(100, 20, 10, 1). The
sample mean, m = 101.84 and the sample standard deviation, s = 23.82. These are the point
estimates for μ and σ. However, it is not clear how large the statistical uncertainties are. Figure
1.10 shows the histograms of B = 1000 resampled m values and s values (B = 1000) based on
the bootstrapping procedure. The 95% confidence intervals of μ and σ can be estimated as
the interval bounded by the 0.025 and 0.975 sample percentiles of the resampled values. This
confidence interval is called the 95% bootstrap confidence intervals, and this method is called
the percentile method (Efron 1981). For μ, the 95% bootstrap confidence interval is [88.37,
115.58]. This can be compared to the 95% analytical confidence interval [84.80, 118.88]
based on Equation 1.21 . For σ, the 95% bootstrap confidence interval is [15.51, 27.90]. This
can be compared to the 95% analytical confidence interval [16.38, 43.49] based on Equation
1.22 . The difference between the bootstrap and analytical confidence intervals is due to the
small sample n = 10. The problem of “insufficient coverage” for bootstrap confidence intervals
of σ was discussed in Schenker (1985): the probability for the bootstrap confidence interval to
cover the actual value of σ is lower than expected. This problem may occur when the sample
size (n) is small. The bootstrap method is based on the assumption that the discrete samples
 
Search WWH ::




Custom Search