Geoscience Reference
In-Depth Information
A principal objective when analyzing complex data sets is to reduce the dimension-
ality of the data set in order to separate important features from redundant information
and noise. The reduced data set is then simpler to interpret. Thus, an excitation-emission
matrix (EEM) data set consisting of many thousands of individual data points (samples
× excitation × emission) may be reduced to a fraction of its original size (e.g., samples ×
intensity at a few wavelength pairs), with greatly increased interpretability and opportu-
nities for visualizing the data graphically, while retaining all of the essential information
contained in the original data set. A second principal objective is to detect patterns in the
relationships between variables, in order to develop prediction/calibration models for other
important parameters that are harder to measure.
This chapter is not intended as a statistics tutorial as much as a broad overview of the
available chemometric techniques likely to be of greatest assistance for interpreting fluo-
rescence data. The list is by no means exhaustive - there are literally thousands of different
techniques and variations to be discovered, and describing all of them would require many
textbooks. Similarly, the algorithms underpinning the chemometric methods discussed here
are presented elsewhere (Désiré-Luc Massart et al., 1988 ; Martens and Næs, 1989 ; Smilde
et al., 2004 ) and many useful tutorials for understanding and applying various techniques
are published (e.g., Geladi and Kowalski, 1986 ; Thomas, 1994 ; Bro, 1997 ; Stedmon and
Bro, 2008 ) or available online (see, e.g., http://www.models.life.ku.dk/ ) . The focus here
is on describing how chemometric methods can and have already been put to use for the
interpretation of natural organic matter fluorescence.
An example data set is used to demonstrate the application of some of the chemometric
techniques discussed in this chapter. The data set is derived from the Horsens catchment,
Denmark and consists of fluorescence EEMs and absorbance measured at 254 nm, together
with dissolved organic carbon (DOC), and the nutrients total dissolved phosphorus (TDP),
total dissolved nitrogen (TDN), dissolved organic phosphorus (DOP = TDP - dissolved
inorganic P), dissolved organic nitrogen (DON = TDN - dissolved inorganic N). The loca-
tions of sampling sites are illustrated in Figure 10.1 . Detailed information about the data set
(20 sites, n = 543 samples) is presented in Stedmon et al. ( 2006 ). Previously, an eight-com-
ponent model was obtained using PARAllel FACtor analysis (PARAFAC; see later) on a
data set consisting of these samples together with more than 600 samples generated during
a series of degradation experiments (Stedmon and Markager, 2005a ). Analyses presented
herein were performed using PLS_toolbox (v. 6.0.1) operated with MATLAB (R2010a).
10.2 Multivariate and Multiway Data Sets
The distinction between multivariate and multiway data is best illustrated by example
( Figure 10.2 ). A simple multivariate data set consists of I samples for which fluorescence
intensities at five emission wavelengths ( x 1 - x 5 ) at a fixed excitation wavelength have
been measured; these data are arranged in a table ( Figure 10.2A ). Now assume that the
experiment had been conducted under four increasing temperatures ( t 1 - t 4 ). There are
now two possible ways to arrange the new data set, either as a two-way ( multivariate )
Search WWH ::




Custom Search