Environmental Engineering Reference
In-Depth Information
normal vector as a generalization of the bivariate case coupled by a correlation matrix ,
(4) single non-normal random variable as a nonlinear transform of the normal random
variable, and (5) multivariate non-normal vector as a component-by-component nonlinear
transform of the multivariate normal case. No prior knowledge of probability and statistics
is required, but the reader may need to read standard texts for details. The emphasis in this
chapter is on how to use the theoretical tools to produce useful results in practice. In other
words, given a table of measured numbers (multivariate data), how would an engineer (1)
identify a reasonable probability model (“goodness-of-fit” problem) from data, (2) estimate
the model parameters (e.g., mean, COV) from data, (3) simulate “virtual site” data from
the probability model, and (4) draw useful engineering conclusions from the probability
model? The sample size of geotechnical data is typically small. Statistical uncertainties are
ubiquitous and play a significant role in practice. Complete multivariate data are also rarely
available. These aspects and other important limitations are comprehensively discussed to
ensure that the engineer is fully informed of the practical limits of statistical inference in
geotechnical engineering.
1.2 norMal ranDoM VarIable
1.2.1 random data
Random data can be viewed as a list of numbers taking a range of values and assuming a
different frequency of occurrences when plotted in the form of a histogram. Random data
can be modeled as a random variable following a cumulative distribution function (CDF).
The CDF can be presented in the form of its derivative for continuous variables. This deriva-
tive is called the probability density function (PDF).
It is crucial to distinguish between a random variable and a list of measured values, say a
list of undrained shear strength ( s u ) values obtained by performing unconfined compression
test on undisturbed samples. The former is a mathematical model. The latter is reality—
what you measure in practice. There are two challenges in linking what you measure to a
random variable.
First, the number of data points in a list of measurements (called sample size) must be inite .
It is relatively easy to simulate a finite list of values if the random variable is defined. For
example, if the undrained shear strength is normally distributed with a mean of 100 kPa and
a standard deviation of 20 kPa, we can obtain, say, 30 values using the MATLAB ® function
normrnd(100, 20, 30, 1). You can perform simulation using Data > Data Analysis > Random
Number Generation in EXCEL as well. It is important to note that the theoretical properties
such as the mean of a random variable can be obtained only from an infinite sample (called
a population). The arithmetic average obtained from a finite sample is called the “sample
mean.” In this chapter, the term “mean” is associated with a random variable while the
term “sample mean” is associated with a finite sample. The same terminology applies to
other properties also. It is possible to simulate different finite samples. Given the random
nature of the data, the sample mean computed from one sample will be different from the
sample mean computed from another sample. This phenomenon is called “statistical uncer-
tainty” and it is crucial to appreciate that all quantities estimated from a finite sample will
be subjected to this fundamental limitation. The upshot is that no theoretical properties can
be estimated with perfect precision. A point estimate will be implicitly associated with a
statistical error. It is arguably more accurate to report an estimate of a theoretical property
in the form of a confidence interval. An alternate method is to report the p -value associated
with a null hypothesis for a theoretical property. These concepts will be made specific in
Search WWH ::




Custom Search