Biology Reference
In-Depth Information
multiple variables (within each block) are measuring the same factor. The problem posed
by correlated independent variables in a multiple regression is sometimes solved by con-
ducting a preliminary PCA to obtain uncorrelated variables (the PCs), then regressing the
dependent variable (e.g. shape) on the PCs. As a result, the construction of independent
variables is determined only by the covariances among them, without considering their
relationship to the dependent variables. In PLS, the axes for both blocks are determined by
the covariances between the blocks, which can yield axes that need not correspond to the
PCs within blocks.
Another difference is that regression typically casts one set of variables as dependent
on the other, whereas PLS treats them symmetrically. That is, PLS does not assume that
one set of variables is independent and the other dependent. Both sets are treated as
jointly (and linearly) related to the same underlying causes. What makes the symmetry of
the method important is that (Model 1) regression is based on a model that assumes that
the independent variable is controlled and therefore all of its variation is explained by the
experimental manipulation; it is measured without unexplained variation (“error”). Hence
all the unexplained variance in the data is ascribed to the dependent variable. No such
model underlies PLS and so no error is ascribed to any variables (in either block). For this
reason alone, we would not expect to obtain the same coefficients from PLS as we obtain
from regression.
There is, however, a form of PLS more comparable to regression, PLS
Regression
( Wold et al., 2001 ). This method uses the basic machinery of PLS, the SVD of the cross-
block covariance matrix, R ( Equation 7.1 ) as the initial step in the procedure. That first
step yields the pair of linear combinations, SA1, for the two blocks, plus the scores for the
paired SA1s. Then, instead of regressing the first block, Y , on the second, X , Y is regressed
on the vector obtained from the scores for X (which may be normalized or otherwise
weighted). Further details on this method are beyond the scope of this chapter; there are
several algorithms for the procedure as well as several methods for obtaining the vector of
scores for the X block (see Mevic and Wehrens, 2007 ). When the variables in the X block
(i.e. the predictors) are all uncorrelated, PLS
Regression will be equivalent to ordinary
least squares linear regression on those variables ( Wold et al., 2001 ).
PLS Compared to PCA
PLS and PCA resemble each other in one important respect: both reduce the dimension-
ality of the data by extracting a set of mutually orthogonal axes. As you recall from
Chapter 6, PCs are extracted from a variance
covariance matrix (by eigenanalysis), pro-
ducing a set of mutually orthogonal dimensions (eigenvectors), ordered according to the
amount of variance each one explains. Similarly, PLS decomposes a matrix into mutually
orthogonal axes, ordered according to the amount of covariance between blocks explained
by each one. The most obvious difference is that PCA examines variation within a single
block of variables whereas PLS examines the covariation between blocks. Consequently,
one of the primary differences between PCs and SAs is that SAs, unlike PCs, come in
pairs. For each singular value there is a pair of axes that, taken together, accounts for the
patterns of covariances between blocks. But despite this obvious difference, both PLS and
Search WWH ::




Custom Search