Biology Reference
In-Depth Information
PCA impose a similar constraint on the analysis: both define axes to be mutually orthogo-
nal. Just as PC2 is defined to be orthogonal to PC1, SV2 is defined to be orthogonal to SV1.
This is important when biological factors are not orthogonal, which may be the general
rule. Even though the axes (both PCs and SAs) provide a useful, simplified space in which
to explore patterns in the data, the axes themselves, beyond the first, need not correspond
to any biological factors. It is likely that PC1 and SA1 have a biological interpretation
when they account for a very large proportion of the variance or covariance, but the
remaining axes are, by definition, constrained to be orthogonal to them, making their
interpretation more dubious. This same issue arises when using PCA for explanatory or
even comparative purposes (see Rohlf and Corti, 2000; Houle et al., 2002; Angielczyk and
Sheets, 2007 ).
Another important similarity between the methods, which also should inspire a cau-
tious approach to interpreting results, is that PLS extracts linear combinations of variables
(like PCA) but the relationship between blocks may be non-linear. In such cases, the first
dimension may represent the dominant linear trend, and others represent orthogonal
deviations from linearity. Thus, we would need to interpret SV1 together with SV2 to
understand the relationship between the two blocks, recognizing that a single non-linear
factor accounts for both. Of course, the issue of linearity is also important whether we are
analyzing the data by PCA/PLS, by regression, or by the method discussed in the follow-
ing section, CCA. However, most workers recognize that linearity is an important assump-
tion of regression; non-linearity might not seem so important in studies using PCA or PLS
because neither method is explicitly based on a linear model so the impact of non-linear
relationships among variables might not seem to violate assumptions of the method.
Unlike the situation for PCA, there is no analytic statistical test of the significance of
SAs, meaning that there is no analytic test for the difference in length between SA1 and
SA2, and so forth. However, as mentioned above, resampling-based approaches can be
applied to test the hypothesis that SA1 (and succeeding SAs) explain more covariance
than expected by chance. A permutation test, discussed by Rohlf and Corti (2000) , deter-
mines whether the singular values are larger than could be produced by a random permu-
tation of associations among variables between blocks (keeping within-block associations
intact).
PLS Compared to Canonical Correlation Analysis
Canonical correlation analysis examines the correlation between blocks of variables and
it closely resembles multiple regression although, as in the case of PLS, CCA treats both
blocks symmetrically. CCA thus differs from multiple regression and resembles PLS in
that neither block is construed as comprising a block of causal variables with the other
comprising the responses. One important difference between CCA and PLS is the quantity
maximized by the two procedures. CCA seeks pairs of axes (canonical axes) that are maxi-
mally correlated with each other. That is, CCA seeks an axis, a linear combination of vari-
ables, from one block that is maximally correlated with a linear combination of variables
from other block. In contrast, PLS seeks axes that maximally account for the covariance
between blocks (for a more detailed comparison between CCA and PLS, see Rohlf and
Search WWH ::




Custom Search