Environmental Engineering Reference
In-Depth Information
probability for σ is noticeably less than 0.95 when
n
is less than 100. The improvement
brought about by the BCa method for the 95% bootstrap confidence intervals of σ for a
small sample size (
n
≤ 50) is evident. In general, the coverage probability is close to 0.95 for
n
≥ 100 for both μ and σ. It is therefore recommended that the sample size
n
should be ≥100
for the bootstrap confidence intervals to work properly.
1.2.3.5.3 Goodness-of-fit test (K-S test)
The normal probability plot is a good visual tool to judge whether the normal distribu-
tion provides a satisfactory fit to the data. In this section, the K-S test (Conover 1999) is
introduced to characterize the goodness of fit for the normal distribution formally using the
framework of hypothesis testing. The null hypothesis H
0
for the K-S test is
H
0
:Yis normally distributed
(1.23)
Namely, F(
y
) = Φ[(
y
− μ)/σ]. Under this null hypothesis, the following statistics D
n
is
asymptotically distributed as the Kolmogorov distribution:
[
]
D
=⋅
n
sup F
()
y
−
F()
y
=
n
⋅
supF
()
y
−
Φµ σ
(
y
−
)
(1.24)
n
n
n
y
y
where “sup” denotes the supremum (the least upper bound);
n
is the sample size of the data
points. One can see that if the null hypothesis H
0
is true, D
n
should be small, because F
n
(
y
)
will be close to F(
y
) under H
0
. As a result, H
0
can be rejected if D
n
is large. Consider the fol-
lowing criterion of rejecting/accepting H
0
:
Reject H fD
>
c
Do not reject H fD
≤
c
(1.25)
0
n
0
n
where
c
is called the critical value. It is a prescribed threshold for D
n
. The critical value
c
is
typically chosen such that the probability of committing Type I error [probability of reject-
ing a true H
0
, namely P(D
n
>
c
)] is equal to a small number α (e.g., α = 0.05). The α value is
called the significance level of the test. The threshold c is in fact the (1 − α) percentile of the
Kolmogorov distribution and can be found in textbooks.
In MATLAB, the command [
h
,
p
,
d
n
] = kstest(
X
, [], α) is for the K-S test for the standard
normal distribution. The inputs include the vector
X
that contains the data (X
(1)
, X
(2)
, …,
X
(
n
)
)
T
(the superscript 'T' means the matrix transpose) and α. The outputs include
h
(
h
= 1
means H
0
is rejected),
p
(
p
-value), and
d
n
(the realization of D
n
). To implement the standard
normal K-S test, one needs to first convert the data (X
(1)
, X
(2)
, …, X
(
n
)
) into their standard-
ized form:
Y
()
k
−
m
()
k
(1.26)
X
=
s
The
p
-value is defined to be P(D
n
>
d
n
). The null hypothesis H
0
is rejected if
p
< α. It can be
seen that the p-value quantifies how strong the rejection is: a small
p
-value indicates strong
rejection. The K-S test with α = 0.05 on the 10 samples of Y gives
h
= 0 (H
0
is not rejected)
and
p
= 0.835. Therefore, the normal distribution hypothesis is not rejected at a significance
level of 0.05. This is expected as the Y samples are simulated from a normal distribution.
However, if one repeats this procedure with different simulated samples say 100 times, one
Search WWH ::
Custom Search