Environmental Engineering Reference
In-Depth Information
probability for σ is noticeably less than 0.95 when n is less than 100. The improvement
brought about by the BCa method for the 95% bootstrap confidence intervals of σ for a
small sample size ( n ≤ 50) is evident. In general, the coverage probability is close to 0.95 for
n ≥ 100 for both μ and σ. It is therefore recommended that the sample size n should be ≥100
for the bootstrap confidence intervals to work properly.
1.2.3.5.3 Goodness-of-fit test (K-S test)
The normal probability plot is a good visual tool to judge whether the normal distribu-
tion provides a satisfactory fit to the data. In this section, the K-S test (Conover 1999) is
introduced to characterize the goodness of fit for the normal distribution formally using the
framework of hypothesis testing. The null hypothesis H 0 for the K-S test is
H 0 :Yis normally distributed
(1.23)
Namely, F( y ) = Φ[( y − μ)/σ]. Under this null hypothesis, the following statistics D n is
asymptotically distributed as the Kolmogorov distribution:
[
]
D
=⋅
n
sup F
()
y
F()
y
=
n
supF
()
y
Φµ σ
(
y
)
(1.24)
n
n
n
y
y
where “sup” denotes the supremum (the least upper bound); n is the sample size of the data
points. One can see that if the null hypothesis H 0 is true, D n should be small, because F n ( y )
will be close to F( y ) under H 0 . As a result, H 0 can be rejected if D n is large. Consider the fol-
lowing criterion of rejecting/accepting H 0 :
Reject H fD
>
c
Do not reject H fD
c
(1.25)
0
n
0
n
where c is called the critical value. It is a prescribed threshold for D n . The critical value c is
typically chosen such that the probability of committing Type I error [probability of reject-
ing a true H 0 , namely P(D n > c )] is equal to a small number α (e.g., α = 0.05). The α value is
called the significance level of the test. The threshold c is in fact the (1 − α) percentile of the
Kolmogorov distribution and can be found in textbooks.
In MATLAB, the command [ h , p , d n ] = kstest( X , [], α) is for the K-S test for the standard
normal distribution. The inputs include the vector X that contains the data (X (1) , X (2) , …,
X ( n ) ) T (the superscript 'T' means the matrix transpose) and α. The outputs include h ( h = 1
means H 0 is rejected), p ( p -value), and d n (the realization of D n ). To implement the standard
normal K-S test, one needs to first convert the data (X (1) , X (2) , …, X ( n ) ) into their standard-
ized form:
Y
()
k
m
()
k
(1.26)
X
=
s
The p -value is defined to be P(D n > d n ). The null hypothesis H 0 is rejected if p < α. It can be
seen that the p-value quantifies how strong the rejection is: a small p -value indicates strong
rejection. The K-S test with α = 0.05 on the 10 samples of Y gives h = 0 (H 0 is not rejected)
and p = 0.835. Therefore, the normal distribution hypothesis is not rejected at a significance
level of 0.05. This is expected as the Y samples are simulated from a normal distribution.
However, if one repeats this procedure with different simulated samples say 100 times, one
 
 
Search WWH ::




Custom Search