Biomedical Engineering Reference
In-Depth Information
Chemometrics with R from the topic Chemometrics with R -
Multivariate Data Analysis in the Natural Sciences and Life Sciences
by R Wehrens [39]. This package contains PCA and MCR routines.
Chemometrics. This package is the R companion to the topic
Introduction to Multivariate Statistical Analysis in Chemometrics by
K Varmuza and P Filzmoser (2009) [40]. This includes PCA, PLS,
clustering, self-organising maps and support vector machines.
pls by R Wehrens and B-H Mevik [41]. Contains both PLS and PCR
methods. This package is easily adapted for PLS-DA using a categorical
Y variable denoting class membership (i.e. 0=control 1= treated).
pcaMethods [42], initiated at the Max-Planck Institute for Molecular
Plant Physiology, Golm, Germany. Now developed at CAS-MPG
Partner Institute for Computational Biology (PICB) Shanghai, P.R.
China and RIKEN Plant Science Center, Yokohama, Japan. pcaMethods
has a number of alternative PCA methods for missing data including
NIPALS and support for cross-validation.
Kopls [38]. An implementation of the kernel-based orthogonal
projections to latent structures (K-OPLS) method for MATLAB and R.
The package includes cross-validation, kernel parameter optimisation,
model diagnostics and plot tools.
4.7.1 Important considerations with
multivariate analysis
The most critical aspect of multivariate analysis is the ability to estimate
the predictive power, or model stability. This is usually implemented
using cross-validation [45] where some data are sequentially left out of
the model and the model re-calculated. The left-out data are then
estimated from the model and the differences are summarised in a
parameter called Q2, the predictive variance. Without an estimate of
predictivity, there is no objective way to estimate the optimum number of
components or even if any components are actually predictive at all. The
variance explained or R2 of a model will keep increasing with every
component and so there is a great danger of overfi tting the model if this
is the only criterion used to judge the model.
The ability to estimate predictivity becomes of paramount importance
when using supervised methods such as PLS-DA. Without the measure of
Q2 it may be possible to get discriminant models which are effectively
worthless, for example getting separations with random data [45]. In
addition to cross-validation, permutation testing is also a highly effective
￿ ￿ ￿ ￿ ￿
 
Search WWH ::




Custom Search