Environmental Engineering Reference
In-Depth Information
p 1 X Xp 1
p 1 p 1
max
p 1
subject to
=
1
.
0
(3.1)
A first summary variable or score, t 1 , is obtained simply by projecting X in the direc-
tion of p 1 , t 1
Xp 1 . This high variance direction is then removed from X , leaving a
residual matrix E 1
=
t 1 p 1 , containing the variance of X that is not explained by
the first component. The construction of the PCA model can continue with the com-
putation of a second linear combination p 2 , explaining the second highest amount
of variance in X . The objective in this case is the same as shown in Equation 3.1,
but replacing p 1 by p 2 and X by E 1 and imposing the additional constraint that the
second component be orthogonal to the first one ( e.g. , p 1
=
X
0). This procedure
is repeated until the desired number of components is computed. The final structure
of the model is X
p 2
=
TP +
E , which can be seen as an eigenvector or singular
value decomposition (SVD) of X X . In fact, the p vectors are just the eigenvectors
of X X and the t vectors are the eigenvectors of XX . When as many components
are computed as there is variables ( e.g. , A = J ), the decomposition of X is perfect and
E
=
=
0.
An alternative approach for computing the p and t vectors sequentially is to use
the Nonlinear iterative partial least squares (NIPALS) algorithm. The starting point
of this algorithm typically (but not always required) consists of mean-centering and
scaling of matrix X (discussed later in this section). The following steps are outlined
below:
1. set t to be one column of X ;
2. p
X t
t t ;
=
/
p p
3. p
=
p
/(
)
;
p p
4. t
;
5. continue iterating between 2. and 4. until convergence on t or p ;
6. residual matrix: E
=
Xp
/(
)
tp ;
7. store p and t in P and T , respectively;
8. calculate next dimensions by returning to 1, using E as the new X .
=
X
After computing each latent variable, one needs to decide whether another dimen-
sion should be added to the PCA model. Cross-validation [16] is often a typically
used criterion for selecting the number of components to keep in the model.
3.2.2 Projection to Latent Structures (PLS)
Projection to latent structures, or alternatively, partial least squares is a truly multi-
variate latent variable regression method. PLS is used to model relationships both
within and between two blocks of data, X and Y . A tutorial on PLS is found in [17]
and a review of PLS history is available in [18]. Some mathematical and statistical
properties of PLS were also addressed in [19, 20].
In PLS, the covariance structures of X and Y are modeled via a set of A latent
variables, t and u respectively, as shown in Figure 3.2(b). However, these latent
Search WWH ::




Custom Search