Environmental Engineering Reference
In-Depth Information
the following sections, but it suffices to appreciate that the above statistical tools are needed
because the sample size is finite in practice. In geotechnical engineering, our sample sizes
are typically small and statistical uncertainty cannot be ignored. Simulation is an important
tool to study statistical uncertainty.
Second, it is important to appreciate that a list of “random” looking measurements does
not necessarily follow a random variable model. From the authors' experience in statistical
modeling of geotechnical engineering data, we have found this model adequate in the sense
of producing meaningful and useful results for practice. This chapter presupposes that a
random variable model is adequate for geotechnical engineering data. If one accepts this
leap of faith, the obvious follow-up question is which CDF would be appropriate. In view
of the finite sample size, this “goodness-of-fit” question cannot be resolved with certainty.
Some standard “goodness-of-fit” tests would be presented below, but one should be mindful
that it is not sufficient to find a good it for a list of measurements (say a column of numbers
in EXCEL). Geotechnical engineering data are multivariate in nature, for example, they
may measure several properties such as the undrained shear strength, natural water content,
Atterberg limits, and preconsolidation pressure from the same undisturbed sample. While
there is a wide choice of probability models to fit a single column of number (univariate
data), there is only one practical choice to fit multiple columns (multivariate data). This
choice involves a column-by-column nonlinear transformation of a multivariate normal
probability model. Because of this restriction, it is more convenient to choose a univari-
ate probability model that is a transformation of the standard normal model. The Johnson
system of distributions is generated by such a transformation and it is useful to start testing
goodness of fit using this system of distributions.
The above “need to know” concepts are explained and illustrated using simulated data
in the sections below. Simulated data are “perfect” in the sense that they are theoretically
derived from a fully defined random variable. Hence, in contrast to actual data, there is no
question that a random variable model works! In addition, it is useful to compare statistics
computed from a finite sample size with the theoretical answers, which are also known since
the random variable is fully defined.
1.2.2 normal random variable
The normal distribution is also called the Gaussian distribution. Symbolically, “Y ~ N(μ, σ 2 )”
means that Y is normally distributed with mean, μ, and standard deviation, σ. The normal
distribution is the most important distribution in characterizing a physical parameter that
can take a range of values with a different likelihood of occurrences. Its importance will be
apparent in the context of non-normal multivariate distributions discussed in Section 1.6.
The concepts discussed below are illustrated using normally distributed undrained shear
strength values with mean of 100 kPa and standard deviation of 20 kPa, unless stated oth-
erwise. These values were simulated using the MATLAB function normrnd. The reader can
reproduce the data by initializing the pseudorandom sequence using randn('state', 13).
1.2.2.1 Probability density function
The PDF for the normal distribution is
(
)
2
−−
y
µ
σ
1
() =
fy
exp
(1.1)
2
2
2
πσ
Search WWH ::




Custom Search