Agriculture Reference
In-Depth Information
EMs. An improved strategy, called variable MESMA (VMESMA), allows segmentation of
the image to increase flexibility and accuracy (Kemper & Sommer 2003).
2.2 Modeling
Modeling refers to relating a set of spectral parameters that are derived from the spectral
information (before or after the aforementioned preprocessing treatment), to the real
chemistry of the material in question. This is done by using a set of well-known samples as
a training group. The data are divided into three groups: training, validation and test. The
relationship between the chemistry and the spectroscopy data is found via the training
group and simultaneously cross-validated by the validation group. Finally, the model is
applied to the test group, independent of the training and validation process. Multivariate
regression techniques are modeling methods that search for the relationship between two
matrices: the spectral data matrix that can be very complex due to large amounts of data (X
variables, the independent data), and a specific chemical reference value data matrix (Y
variables, the dependent data). The common multivariate regression techniques are
presented herein. For an in depth reference, please read further at (Esbensen et al. 2002;
Nicolaï et al. 2007).
2.2.1 Multiple linear regression
Multiple linear regression (MLR) is a classical method that creates a linear combination of
the spectral values at every single wavelength to correlate as closely as possible to the
dependent reference values. The regression coefficients are estimated by minimizing the
error between predicted and observed response values in a least squares sense. MLR models
typically do not perform well with spectral data because spectral data usually exhibit high
co-linearity, noise and more variables, i.e. more spectral bands, than measured samples
(Esbensen et al. 2002; Nicolaï et al. 2007).
2.2.2 Principal component regression
Principal component regression (PCR) is a combination of principle component analysis
(PCA) and MLR. The independent data matrix (spectral data) is transformed by PCA, and
the first few principal components (PCs), which represent most of the independent data
variance, are used as inputs for the MLR model instead of the original spectral data. The
advantage over standard MLR is that PCs are uncorrelated, and the noise is filtered. The
first few PCs are usually sufficient for a robust model and over-fitting issues can be
eliminated. Although PCR consists of the two most studied multivariate methods (PCA and
MLR), the major criticism against it is that the PCs (several first PCs) selected for the MLR
input are not necessarily the best predictors for the reference data. There is no guarantee
that the first PCs will include the spectral data related to the specific dependent variable that
needs to be modeled (Esbensen et al. 2002; Nicolaï et al. 2007).
2.2.3 Partial least squares regression
Introduced in 1983 by Wold et al., partial least squares regression (PLS) is similar to PCR,
but in PLS the PCs are constructed such that they include the chemical reference (Y
variables, dependent data) in the calculation process. This technique orders the PCs
according to their relevance for predicting the dependent variables, rather than to their
description of the most variance of the spectral data. This method excels when the
Search WWH ::




Custom Search