Environmental Engineering Reference
In-Depth Information
occupy multiple columns. Bivariate data can be visualized as a point in a scatter plot.
Multivariate data can also be visualized as a point in higher-dimensional space, but it is
hard to visualize scatter beyond our familiar three-dimensional space. We have empha-
sized that the two values in any given row are linked for bivariate data. For the case of
bivariate normal data, we have discussed the characterization of this linkage or depen-
dency using the Pearson correlation coefficient, a number lying between −1 and 1. If there
are three columns of data, it is easy to envisage the computation of three distinct correla-
tion coefficients: one between columns 1 and 2, one between columns 1 and 3, and one
between columns 2 and 3. The astute reader may generalize the Pearson correlation coef-
ficient involving the product of two columns to a higher-order coefficient involving the
product of three columns. Although these higher-order product-moment coefficients can
be defined mathematically, they are not used practically for two reasons. First, statistical
uncertainty increases with the order of the product-moment coefficient. In other words,
computing a coefficient involving three columns will incur more statistical uncertainty
than the one involving two columns. Second, the only practical multivariate probability
model is the multivariate normal model. This model requires only the computation of
all bivariate Pearson correlation coefficients. It does not require higher-order product-
moment coefficients. In fact, all higher-order product-moment coefficients can be com-
puted from Pearson correlation coefficients using closed-form equations for the special
case of multivariate normal data (Isserlis 1918).
Multivariate information is usually gathered in a typical site investigation. For instance,
when undisturbed samples are extracted for odometer and triaxial tests, SPT and/or piezo-
cone test (CPTU) may be conducted in close proximity. Moreover, index properties such as
the unit weight, natural water content, plastic limit, LL, and liquidity index (LI) are com-
monly determined from relatively simple laboratory tests on disturbed samples. It is gener-
ally known that data from these varied sources are not independent if they are measured
in close physical proximity. The definition of “close” is related to the spatial variability of
the site. These data sources are typically correlated to a design parameter, for example, the
undrained shear strength ( s u ). These correlations can be exploited to reduce the COV of the
design parameter. The impact on RBD is obvious.
When multivariate geotechnical data exist in sufficient amount, it is of significant practi-
cal usefulness to construct a multivariate probability distribution function, which is usu-
ally based on the multivariate normal distribution (Ching and Phoon 2012, 2013; Ching
et al. 2014b). The applications include: (a) deriving the mean and COV of any parameter
given the information contained in a subset with possibly more than one parameter, and
(b) evaluating if new strong bivariate correlations can be found either among the original
components or some derived components. For the former, it is likely for the COV of a design
parameter, say the undrained shear strength ( s u ), to reduce when other parameters, say the
normalized cone tip resistance and OCR, have been measured. This aspect is significant for
RBD. In fact, COV reduction can be viewed as a measure of the value of information and
may eventually underpin a defensible “information-sensitive” framework for justifying the
investment to measure an additional parameter. In addition, the ability to predict the exis-
tence of new correlations not included as part of the model calibration provides a stronger
scientific underpinning to correlation studies in geotechnical engineering. The reason is that
these predictions can be falsified by taking new observations, which is the cornerstone of
the scientific method. In other words, it is a lot harder to develop multivariate models, but
if they stand the test of time, they are usually more robust than bivariate models. There is
an oft-expressed adage, “don't confuse me with more data!”, that nicely expressed the chal-
lenge of combining varied data sources.
Search WWH ::




Custom Search