Biology Reference
In-Depth Information
COMPARING PLS TO OTHER METHODS
PLS resembles several other methods, including simple and multiple regression, princi-
pal components analysis (PCA), and canonical correlation analysis (CCA). PLS can also be
used to discriminate between groups, making it an alternative to discriminant function
analysis (DFA) and canonical variates analysis (CVA). The relationship between methods
is complicated because PLS can be approached from multiple perspectives, but we focus
on PLS as solving one particular sort of eigenstructure problem and having the constraints
on the directions of the SAs noted above, i.e. that they be mutually orthogonal. Below we
briefly compare methods so that you can decide which is most useful for your purposes.
PLS Compared to Multiple Regression
Both PLS and multiple regression can examine the relationship between two multivari-
ate sets of variables, but they differ in two important respects. First, and most importantly,
PLS does not require that the variables in either block be uncorrelated with each other,
and works most effectively when they are not, whereas multiple regression has difficulty
determining the variance explained by highly correlated predictive variables. In PLS, the
correlations among the variables are thought to reflect their joint response to underlying
(unobserved) variables, often called “latent variables” (a concept frequently used in PLS).
To estimate a latent variable, it is important to have multiple observed variables because
their correlations are explained by their dependence on the latent variable. For example, to
measure the latent variable “climate” we would use multiple observed climatic variables
(e.g. maximum monthly temperature, minimum monthly temperature, maximum monthly
precipitation, minimum monthly precipitation, seasonality, etc.). The correlations among
them are explained by “climate”. Rather than exploring the structure of these measure-
ments within a block to extract that latent variable, PLS seeks the combination of the cli-
mate variables that maximally covary with the other block of variables
the linear
combination of climate variables most relevant for explaining the other block. These coeffi-
cients are called saliences because they indicate which variables in one block are most rele-
vant (salient) for explaining covariation with the other block. The ability to find that
combination is enhanced by having multiple correlated observed variables.
In striking contrast, the coefficients of a multiple regression express the dependence of
the dependent variables (e.g. shape) on one independent variable, with all others held con-
stant. Consequently, correlations between the independent variables are a problem for the
method. When the independent variables are correlated with each other, most of the vari-
ance in the dependent variables will be associated with one independent variable, the one
first entered into the model, leaving little to be explained by the others, as discussed in the
chapter on General Linear Models. Even though all the independent variables might affect
the dependent variables, only one might be accorded a high weight, making the others
appear to have trivial explanatory power. That is because they are explaining the residual
variance, i.e. the variance not already explained by the one with the large coefficient.
Multiple regression is thus poorly suited to cases in which the independent variables are
correlated with each other. In contrast, PLS is specifically intended for the case in which
Search WWH ::




Custom Search