Constructing multivariate distributions for soil parameters - Risk and Reliability in Geotechnical Engineering

Environmental Engineering Reference

In-Depth Information

probability for σ is noticeably less than 0.95 when n is less than 100. The improvement

brought about by the BCa method for the 95% bootstrap confidence intervals of σ for a

small sample size ( n ≤ 50) is evident. In general, the coverage probability is close to 0.95 for

n ≥ 100 for both μ and σ. It is therefore recommended that the sample size n should be ≥100

for the bootstrap confidence intervals to work properly.

1.2.3.5.3 Goodness-of-fit test (K-S test)

The normal probability plot is a good visual tool to judge whether the normal distribu-

tion provides a satisfactory fit to the data. In this section, the K-S test (Conover 1999) is

introduced to characterize the goodness of fit for the normal distribution formally using the

framework of hypothesis testing. The null hypothesis H 0 for the K-S test is

H 0 :Yis normally distributed

(1.23)

Namely, F( y ) = Φ[( y − μ)/σ]. Under this null hypothesis, the following statistics D n is

asymptotically distributed as the Kolmogorov distribution:

[

]

=⋅

sup F

()

−

F()

⋅

supF

()

−

Φµ σ

(

−

)

(1.24)

where “sup” denotes the supremum (the least upper bound); n is the sample size of the data

points. One can see that if the null hypothesis H 0 is true, D n should be small, because F n ( y )

will be close to F( y ) under H 0 . As a result, H 0 can be rejected if D n is large. Consider the fol-

lowing criterion of rejecting/accepting H 0 :

Reject H fD

Do not reject H fD

≤

(1.25)

where c is called the critical value. It is a prescribed threshold for D n . The critical value c is

typically chosen such that the probability of committing Type I error [probability of reject-

ing a true H 0 , namely P(D n > c )] is equal to a small number α (e.g., α = 0.05). The α value is

called the significance level of the test. The threshold c is in fact the (1 − α) percentile of the

Kolmogorov distribution and can be found in textbooks.

In MATLAB, the command [ h , p , d n ] = kstest( X , [], α) is for the K-S test for the standard

normal distribution. The inputs include the vector X that contains the data (X (1) , X (2) , …,

X ( n ) ) T (the superscript 'T' means the matrix transpose) and α. The outputs include h ( h = 1

means H 0 is rejected), p ( p -value), and d n (the realization of D n ). To implement the standard

normal K-S test, one needs to first convert the data (X (1) , X (2) , …, X ( n ) ) into their standard-

ized form:

()

−

()

(1.26)

The p -value is defined to be P(D n > d n ). The null hypothesis H 0 is rejected if p < α. It can be

seen that the p-value quantifies how strong the rejection is: a small p -value indicates strong

rejection. The K-S test with α = 0.05 on the 10 samples of Y gives h = 0 (H 0 is not rejected)

and p = 0.835. Therefore, the normal distribution hypothesis is not rejected at a significance

level of 0.05. This is expected as the Y samples are simulated from a normal distribution.

However, if one repeats this procedure with different simulated samples say 100 times, one

Risk and Reliability in Geotechnical Engineering

Search WWH ::

Custom Search

Home