Geoscience Reference
In-Depth Information
even though nature rarely falls into discrete classes. Classii cation (or
categorization) is useful as it can, for example, help decision makers to take
necessary precautions to reduce risk, to drill an oil well, or to assign fossils
to a particular genus or species. Most classii cation methods make decisions
based on Boolean logic with two options, true or false; an example is the use
of a threshold value for identifying charcoal in microscope images (Section
8.11). Alternatively, fuzzy logic (which is not explained in this topic) is a
generalization of the binary Boolean logic with respect to many real world
problems in decision-making, where gradual transitions are reasonable
(Zadeh 1965, MathWorks 2014a).
h e following sections introduce the most important techniques of
multivariate statistics: principal component analysis (PCA) and cluster
analysis (CA) in Sections 9.2 and 9.5, and independent component analysis
(ICA), which is a nonlinear extension of PCA, in Section 9.3. Section
9.4 introduces discriminant analysis (DA), which is a popular method
of classii cation in earth sciences. Section 9.6. introduces multiple linear
regression . h ese sections i rst provide an introduction to the theory behind
the various techniques and then demonstrate their use for analyzing earth
sciences data, using MATLAB functions (MathWorks 2014b).
9.2 Principal Component Analysis
Principal component analysis (PCA) detects linear dependencies between
variables and replaces groups of correlated variables with new, uncorrelated
variables referred to as the principal components (PCs). PCA was introduced
by Karl Pearson (1901) and further developed by Harold Hotelling (1931).
h e performance of PCA is better illustrated with a bivariate data set than
with a multivariate data set. Figure 9.1 shows a bivariate data set that exhibits
a strong linear correlation between the two variables x and y in an orthogonal
xy coordinate system. h e two variables have their individual univariate
means and variances (Chapter 3). h e bivariate data set can be described by
the bivariate sample mean and the covariance (Chapter 4). h e xy coordinate
system can be replaced by a new orthogonal coordinate system, where the
i rst axis passes through the long axis of the data scatter and the new origin
is the bivariate mean. h is new reference frame has the advantage that the
i rst axis can be used to describe most of the variance, while the second axis
contributes only a small amount of additional information. Prior to this
transformation two axes were required to describe the data set, but it is now
possible to reduce the dimensions of the data by dropping the second axis
without losing very much information, as shown in Figure 9.1.
h is process is now expanded to an arbitrary number of variables and
Search WWH ::




Custom Search